JP3673834B2

JP3673834B2 - Gaze input communication method using eye movement

Info

Publication number: JP3673834B2
Application number: JP2004236083A
Authority: JP
Inventors: 幹也田中; 嘉樹水上; 裕治若佐
Original assignee: Yamaguchi University NUC
Current assignee: Yamaguchi University NUC
Priority date: 2003-08-18
Filing date: 2004-08-13
Publication date: 2005-07-20
Anticipated expiration: 2024-08-13
Also published as: JP2005100366A

Abstract

<P>PROBLEM TO BE SOLVED: To construct a visual line input communication system using eyeball motion for enabling stay-home work to be carried out by selecting an intended item on a display screen, by obtaining a face image of a patient by using a video camera to detect a visual line direction of the patient by image processing in a contactless manner, by inputting the visual line only by an eyeball function by eyeball motion and the opening-closing action of an eyelid, and by operation by a virtual board with a switching operation, by setting, as an issue, development of the visual line input communication system using eyeball motion. <P>SOLUTION: This method is so structured that a differential image is obtained, by making an examinee open and close his or her eyes, from an image formed by capturing the entire face of the examinee; then, templates of eyes and eyebrows are registered; and the visual line direction is detected by comparing data obtained in calibration with data obtained from the input image. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、重度の筋萎縮性側策硬化症患者（以下、ＡＬＳという）等と介護者、家族等とのコミュニケーションを支援する、眼球運動を用いた視線入力コミュニケーションシステムに関する。 The present invention relates to a gaze input communication system using eye movement that supports communication between a severe amyotrophic side sclerosis patient (hereinafter referred to as ALS) and the like, a caregiver, a family, and the like.

ビデオカメラより取り込んだ画像から、視線方向を推定する研究は歴史が長く、従来の研究には近赤外線を用いる手法や、蛍光灯を瞳孔内に映し出すことにより、視線方向を検出する手法等がある。 Research on estimating the direction of gaze from images captured from a video camera has a long history, and conventional research includes methods that use near-infrared rays and methods that detect the direction of gaze by projecting a fluorescent lamp in the pupil. .

まず、ＡＬＳ患者等が意思伝達装置を使用している際、顔が動くことによって、眼球の注視位置が検出できない状況をなくし、意思伝達装置が誤操作しないようにすると共に、制御装置の小型化と経済性の優れたものとする特許が公開されている（例えば、特許文献１）。該装置は、視線を利用して意思伝達装置を使用する際には誤操作せず、容易且つ正確に制御できる装置とした。また、該装置は小型で経済性の優れたものとした。具体的には、被験者の顔をデータとして読み込む際に使用する方法を赤外線からＣＣＤカメラとした。また、顔の画像データ量（眼球の位置だけではなく顔の位置、向き、視線方向などとした）を増やした。更に、予めモニターを分割化して被験者がどの部分を見ていたか認識させ画像としてデータ化（辞書）し、実際に使用時には被験者がどのキーを見ているかを辞書から画像データを選定させることで誤操作をなくし、小型化を実現し、経済性の優れたものとする。意思伝達装置を容易且つ正確に、視線による指示入力し制御する具体的な手順は以下の通りである。予め意思伝達装置のモニターを分割化し、被験者に分割箇所の一つ一つの箇所を見てもらい、その都度ＣＣＤカメラによって被験者の画像（顔の位置、向き、視線方向）を読み込み認識させデータ化（辞書）しておく。意思伝達装置の使用時には、ＣＣＤカメラによって使用者の画像データを読み取り、実際に被験者が操作する上で前記モニター上のキーを見た画像と予め画像データ化しておいた辞書とを比較して辞書の中から類似した画像データを選出し、選出した画像データと対応した箇所に録音されている会話の発声と操作手順を実行する。ところが、特許文献１に記載のものは、本発明のようにＡＬＳ患者等が眼球機能のみで操作が簡単で汎用的なＰＣソフトが利用できないといった問題がある。 First, when an ALS patient or the like is using a communication device, the situation where the gaze position of the eyeball cannot be detected due to the movement of the face is eliminated, and the communication device is prevented from being erroneously operated. Patents that are excellent in economic efficiency are disclosed (for example, Patent Document 1). The apparatus is an apparatus that can be easily and accurately controlled without erroneous operation when using the communication device using the line of sight. The apparatus was small and economical. Specifically, the method used when reading the test subject's face as data was changed from infrared rays to a CCD camera. In addition, the amount of face image data (not only the position of the eyeball but also the position, orientation, and line-of-sight direction of the face) was increased. In addition, by dividing the monitor in advance and recognizing what part the subject was looking at, it was converted into data (dictionary) as an image, and when it was actually used, it was erroneously operated by selecting image data from the dictionary to see which key the subject was looking at To achieve miniaturization and excellent economic efficiency. The specific procedure for inputting and controlling the instruction transmission device easily and accurately by visual line is as follows. Dividing the monitor of the communication device in advance and having the subject look at each of the divided parts, each time the subject's image (face position, orientation, line-of-sight direction) is read and recognized by the CCD camera and converted into data ( Dictionary). When using the communication device, the user's image data is read by a CCD camera, and the image obtained by viewing the keys on the monitor when the subject actually operates is compared with a dictionary that has been converted into image data in advance. Similar image data is selected from the above, and the speech and operation procedure of the conversation recorded at the location corresponding to the selected image data is executed. However, the one described in Patent Document 1 has a problem that, as in the present invention, an ALS patient or the like is easy to operate with only an eyeball function and general-purpose PC software cannot be used.

次に、ビデオカメラより取り込んだ画像から、視線方向を検出する従来の技術について説明する。表示画面上に取り付けた２台の小型カメラで取り込んで操作者の顔面の映像から、画像処理装置によりいくつか決定する。視線方向算出装置は、事前に決められている。表示画面上の基準点を見ている時いくつかの点を初期値として記憶する。また、画像処理装置により決定された点と初期値として記憶している点から顔面と眼球の方向を決定し、これらに基づいて視線の方向を算出する。算出された視線方向は演算処理装置に与えられる。また、操作者までの距離が決定され、結果が演算処理装置に与えられる。演算処理装置は、操作者が事前に決められた距離よりも近くにいる場合にかぎり、表示画面上にカーソルを出力する。しかも、視線の方向に応じてカーソルを移動させるようになっている（例えば、特許文献２）。ところが、特許文献２に記載のものは、本発明のようにＡＬＳ患者等が眼球機能のみで操作が簡単で汎用的なＰＣソフトが利用できないといった問題がある。 Next, a conventional technique for detecting the line-of-sight direction from an image captured from a video camera will be described. Some are determined by the image processing apparatus from the images of the operator's face captured by two small cameras mounted on the display screen. The gaze direction calculation device is determined in advance. When looking at a reference point on the display screen, several points are stored as initial values. Further, the direction of the face and the eyeball is determined from the point determined by the image processing apparatus and the point stored as the initial value, and the direction of the line of sight is calculated based on these. The calculated line-of-sight direction is given to the arithmetic processing unit. Further, the distance to the operator is determined, and the result is given to the arithmetic processing unit. The arithmetic processing unit outputs a cursor on the display screen only when the operator is closer than a predetermined distance. Moreover, the cursor is moved according to the direction of the line of sight (for example, Patent Document 2). However, the one described in Patent Document 2 has a problem that, as in the present invention, an ALS patient or the like is easy to operate with only an eyeball function and general-purpose PC software cannot be used.

また、ユーザのジェスチャ、音声、操作、視線、瞬きの少なくとも一つの情報に基づき、視線検出の機能を中断、再開することで、ユーザの意思に従ってカーソルを制御するようにしたものであり、利用者の視線方向を検出する視線検出手段と、検出した視線位置にカーソルを移動させるかさせないかのカーソル追従モードを管理するカーソル管理手段と、検出された視線位置にカーソルを移動させるカーソル制御部とを備えたものがある（例えば、特許文献３）。ところが、特許文献３に記載のものは特許文献２と同様に、本発明のようにＡＬＳ患者等が眼球機能のみで操作が簡単で汎用的なＰＣソフトが利用できないといった問題がある。 In addition, the cursor is controlled according to the user's intention by interrupting and resuming the function of gaze detection based on at least one information of the user's gesture, voice, operation, gaze, and blink. A line-of-sight detection means for detecting the line-of-sight direction, a cursor management means for managing a cursor tracking mode for determining whether or not to move the cursor to the detected line-of-sight position, and a cursor control unit for moving the cursor to the detected line-of-sight position Some are provided (for example, Patent Document 3). However, similar to Patent Document 2, the one described in Patent Document 3 has a problem that ALS patients and the like are easy to operate with only the eyeball function and general-purpose PC software cannot be used.

T.N.Cornsweetらは、近赤外線を目に照射することによって作られる、第１〜第４
Purkinje像の中で、第１ Purkinje像（角膜表面の反射像）と第４ Purkinje像（水晶体裏面の反射像）が頭部の動きの影響を相殺することに着目し、頭部をあご台とヘッドレストで簡単に固定するだけで高精度な視線検出を可能にした（非特許文献１）。しかし、第１Purkinje像に比べて１／５００倍程度の第４Purkinje像からの信号を分離し検出するための光学系が、複雑かつ大掛かりになるという問題がある。 TNCornsweet et al. Are made by irradiating the eyes with near infrared rays.
Focusing on the Purkinje image, the first Purkinje image (reflected image on the cornea surface) and the fourth Purkinje image (reflected image on the back of the crystalline lens) offset the effects of head movement. High-precision line-of-sight detection is made possible by simply fixing the headrest (Non-Patent Document 1). However, there is a problem that an optical system for separating and detecting a signal from the fourth Purkinje image that is about 1/500 times as large as the first Purkinje image becomes complicated and large.

飯田と伴野は、角膜強膜反射法を利用したアイカメラと３次元磁気センサの併用により、使用者の頭の動きによらずに、表示画面上の注視点を検出する手法について提案している（非特許文献２）。ところが、アイカメラは角膜（黒目）と強膜（白目）の光の反射率の違いを利用したＬＢＭ方式のアイカメラを用いており、頭部に装着する必要がある。精度評価を行った結果は、検出された注視点と指標の間の平均誤差として０.８９ deg.であった。また、視線とマウスを併用することにより、マウス単体による指示入力に比べて、指標の移動距離が大きい場合には有利であることが確認されている。しかし、単一色の背景中に指標やカーソルを表示するという、特殊なケースで実験した等の問題が残る。そこで、実際のワークステーションで表示画面を指示する場合にできるだけ近い指示入力実験を行い、有効性を実用に合った方法で確認している（非特許文献３）。 Iida and Banno have proposed a method for detecting a point of interest on the display screen regardless of the movement of the user's head by using an eye camera using a corneal scleral reflection method and a three-dimensional magnetic sensor. (Non-patent document 2). However, the eye camera uses an LBM type eye camera that uses the difference in light reflectance between the cornea (black eye) and the sclera (white eye), and needs to be worn on the head. The result of the accuracy evaluation was 0.89 deg. As an average error between the detected gaze point and the index. Further, it has been confirmed that the use of the line of sight and the mouse is advantageous when the movement distance of the index is large compared to the instruction input by the mouse alone. However, problems such as experiments in a special case of displaying an index or a cursor in a single color background remain. Therefore, an instruction input experiment that is as close as possible to the case of instructing a display screen on an actual workstation is performed, and the effectiveness is confirmed by a method that suits practical use (Non-Patent Document 3).

伴野は、近赤外線を眼に照射した時に反射し瞳孔から出る光をカメラでとらえるには、照明装置の配置条件に大きく依存してしまうことに着目し、配置条件の異なる２種類の照明を用いて、瞳孔を抽出する方法を提案している（非特許文献４）。眼球を二つの球が重なったモデルで近似し、レイトレーシングにより、瞳孔全体が同様な明るさで撮影される照明配置条件と、瞳孔が暗く撮影される照明配置条件を求めた。この２つの配置条件の下で、瞳孔が明るい画像と暗い画像を同一カメラで撮影し、これらの差分をとることで瞳孔を抽出している。 Paying attention to the fact that the camera reflects the light that is reflected and emitted from the pupil when near-infrared rays are applied to the eyes, Banno uses two types of illumination with different arrangement conditions. Thus, a method for extracting the pupil has been proposed (Non-Patent Document 4). The eyeball was approximated by a model in which two spheres overlap, and the lighting arrangement conditions under which the entire pupil was photographed with the same brightness and the illumination arrangement conditions under which the pupil was photographed darkly were obtained by ray tracing. Under these two arrangement conditions, an image with a bright pupil and a dark image are taken with the same camera, and the pupil is extracted by taking the difference between them.

伴野と岸野は、ステレオ画像計測により顔の３点と瞳孔の空間位置を求め、特徴点の位置情報より、眼球中心を計測し、視線検出を行っている（非特許文献５）。縁に三つのマークをつけた、レンズのない眼鏡を装着することにより、顔上に動きの少ない三つの特徴点を作り出す。２台のカメラシステムは、各々四つの特徴点を撮影画面いっぱいにとらえることで、特徴点の相対的な３次元位置を０.１〜０.１５ｍｍ程度の精度で検出できる。 Banno and Kishino obtain three spatial positions of the face and pupil by stereo image measurement, measure the center of the eyeball from the position information of the feature points, and perform gaze detection (Non-patent Document 5). By wearing eyeglasses with three marks on the edge and without lenses, three feature points with little movement are created on the face. Each of the two camera systems can detect the relative three-dimensional positions of the feature points with an accuracy of about 0.1 to 0.15 mm by capturing four feature points on the entire photographing screen.

向井らは、濃淡画像を用いて特徴パターン抽出による視線方向検出を行っている（非特許文献６）。照明には一般的な蛍光灯を用いており、赤外線光などの特殊照明灯は使用していないのが特徴である。１００インチ表示画面内を３×３に分割し、９方向の視線識別を行っている。入力画像には、顔の鼻より上が用いられており、ソーベルフィルタを用いて、目の位置を検出する。得られた目の位置情報より、目周辺画像を抜き出す。肌の色がノイズとなり黒目の位置を得ることが難しいので、肌の色に近い色彩を除去し、濃淡化することにより特徴パターンを抽出している。標準パターンとのマッチングにより、視線方向を識別する。標準パターンは学習用画像データより作成され、両目それぞれに対して水平・垂直各３パターンが用意されている。実験の結果、９方向識別において、正解率が７１.４％であった。表示画面が１００インチという大きさを考慮すると、実用的ではないと考えられる。 Mukai et al. Performs gaze direction detection by extracting feature patterns using grayscale images (Non-Patent Document 6). A general fluorescent lamp is used for illumination, and a special illumination lamp such as infrared light is not used. The 100-inch display screen is divided into 3 × 3, and line-of-sight identification in 9 directions is performed. In the input image, the area above the nose of the face is used, and the position of the eyes is detected using a Sobel filter. An eye periphery image is extracted from the obtained eye position information. Since the skin color becomes noise and it is difficult to obtain the position of the black eye, the feature pattern is extracted by removing the color close to the skin color and making it darker. The line-of-sight direction is identified by matching with the standard pattern. The standard pattern is created from learning image data, and three horizontal and vertical patterns are prepared for each eye. As a result of the experiment, the correct answer rate was 71.4% in 9-direction discrimination. Considering the size of the display screen of 100 inches, it is considered impractical.

青山らは、眼球の回転角度に顔方向を加算することにより視線方向を求めている（非特許文献７）。心理実験を行い、視線方向の推定には、両目と口の情報から推定できることを確認している。入力画像は胸上の上半身全体であり、エッジ、モザイクパターンを用いて、入力画像より顔画像領域を抽出する。両目と口の抽出には、テンプレートマッチングを用いる。あらかじめ、本人の正面画像より切り出した両目、口の画像を用いてマッチングを行い、各々候補領域を１０個ずつ決定する。得られた候補領域より、両目、口として適切な距離にある組み合わせを選択する。目頭・顔の両端も検出し、顔の方向を推定する。テクスチャマッピング画像を用いることにより、投影像と入力画像を比較することにより方向の補正を行っている。評価実験結果は、まず特徴抽出に成功したのが１２６枚中１１３枚、８９.７％であった。左右方向の平均誤差は円筒モデルで１２.９度、平面モデルで１０.２度であった。上下方向に関しては、今後の課題となっている。 Aoyama et al. Obtains the line-of-sight direction by adding the face direction to the rotation angle of the eyeball (Non-Patent Document 7). Psychological experiments have been conducted and it has been confirmed that gaze direction can be estimated from information on both eyes and mouth. The input image is the entire upper body on the chest, and a face image area is extracted from the input image using edges and a mosaic pattern. Template matching is used to extract both eyes and mouth. Matching is performed in advance using both eyes and mouth images cut out from the front image of the person himself, and ten candidate areas are determined for each. From the obtained candidate regions, a combination at an appropriate distance as both eyes and mouth is selected. Both ends of the eyes and face are also detected, and the direction of the face is estimated. By using the texture mapping image, the direction is corrected by comparing the projected image with the input image. As a result of the evaluation experiment, 113 out of 126 sheets and 89.7% succeeded in feature extraction. The average error in the horizontal direction was 12.9 degrees for the cylindrical model and 10.2 degrees for the planar model. In the vertical direction, this is a future issue.

堀場らは、目周辺領域を拡大した画像を２値化することによって、眉毛端点、虹彩中心を抽出し、２点間の相対距離の変位によって視線方向を推定している（非特許文献８）。基準点を顔画像上に設けることにより、頭部の動きを許容するとあるが、目周辺画像が拡大されており、頭部が揺らぐとカメラ画像内から目、眉がはみ出してしまうことが十分に考えられる。 Horiba et al. Binarize an image in which the area around the eyes is binarized to extract eyebrow end points and iris centers, and estimate the line-of-sight direction based on the displacement of the relative distance between the two points (Non-patent Document 8). . By providing a reference point on the face image, the movement of the head is allowed, but the image around the eyes is enlarged, and it is sufficient that the eyes and eyebrows protrude from the camera image when the head shakes. Conceivable.

西内らは、マーカーを必要とせずに顔の特徴点の抽出を行うことによって顔の向きを検出し、黒目中心の位置を加算することにより視線検出を行っている（非特許文献９）。２値化によって白と黒のみに変換された顔画像より、両目頭と二つの鼻の穴の最も近接する位置の中点を、鼻の特徴点として抽出する。各個人の顔の特徴点間の距離は、あらかじめ測定されており、この距離より三次元空間における座標を計算し顔の向きを推定する。なお、赤外線を用いない代わりに、蛍光灯をＣＲＴの下部に置き、それ以外の照明は無いものとしている。黒目中心は、虹彩領域内に映し出された蛍光灯の光の反射と、黒目の端点より求められる。 Nishiuchi et al. Detect the face direction by extracting facial feature points without the need for markers, and detect the line of sight by adding the position of the center of the black eye (Non-Patent Document 9). From the face image converted to only white and black by binarization, the midpoint of the closest positions of both eyes and the two nostrils is extracted as a feature point of the nose. The distance between the feature points of each person's face is measured in advance, and the coordinates in the three-dimensional space are calculated from this distance to estimate the face direction. Instead of using infrared rays, a fluorescent lamp is placed below the CRT, and there is no other illumination. The center of the black eye is obtained from the reflection of the fluorescent light projected in the iris area and the end point of the black eye.

竹上、後藤は、角膜反射像と虹彩領域の相対関係に基づき、視線方向を推定している（非特許文献１０）。視線方向の変化に伴って、角膜における光源の反射像の位置が虹彩領域内で相対的に変化することに着目している。また、角膜反射像と虹彩領域のエッジ部分を特徴点として利用することにより、頭部の固定や指標等を装着することなく、単一カメラで比較的高精度な計測を可能としている。光源によって、安定的に角膜反射像が作り出せるのか問題は残るが、固視微動とほぼ対応する精度（±０.５〜０.９ deg.）で検出できることを実験により確認している。 Takegami and Goto have estimated the gaze direction based on the relative relationship between the cornea reflection image and the iris region (Non-patent Document 10). It is noted that the position of the reflected image of the light source in the cornea relatively changes in the iris region with a change in the viewing direction. In addition, by using the cornea reflection image and the edge portion of the iris region as feature points, it is possible to perform relatively high-precision measurement with a single camera without fixing the head or wearing an index or the like. Although it remains a problem whether a corneal reflection image can be stably produced with a light source, it has been confirmed by experiments that it can be detected with an accuracy (± 0.5 to 0.9 deg.) Substantially corresponding to fixation fine movement.

特開２００１−３５０５７８号公報JP 2001-350578 A 特開平５−２９８０１５号公報JP-A-5-298015 特開２００１−１００９０３号公報JP 2001-100903 A

T.N. Cornsweet and H.D. Crane,“Accurate two-dimensional eye tracker using first and fourth Purkinje images,”JournalOpt.Soc.Am.,vol.62,No.8,pp.921-928,1973.T.N.Cornsweet and H.D.Crane, “Accurate two-dimensional eye tracker using first and fourth Purkinje images,” Journal Opt. Soc. Am., Vol. 62, No. 8, pp. 921-928, 1973. 飯田,伴野,“頭部の動きを許容した注視点検出装置と指示入力への応用,”電子情報通信学会論文誌,D-II,No.4,pp.520-527,1991.Iida, Banno, “A gaze detection device that allows head movement and its application to instruction input,” IEICE Transactions, D-II, No. 4, pp. 520-527, 1991. 伴野,鉄谷,岸野,“視線とマウスを併用する指示入力法の評価,”電子情報通信学会論文誌,D-II,No.6,pp.867-875,1993.Banno, Tetsuya, Kishino, “Evaluation of Instruction Input Method Using Gaze and Mouse,” IEICE Transactions, D-II, No.6, pp.867-875, 1993. 伴野,“視線検出のための瞳孔撮影光学系の設計法,”電子情報通信学会論文誌,D-II,No.6,1991.Banno, “Design method of pupil imaging optical system for eye-gaze detection,” IEICE Transactions, D-II, No. 6, 1991. 伴野,岸野,“顔と瞳孔の3次元位置計測に基づく注視点検出アルゴリズム,”電子情報通信学会論文誌,D-II,No.5,pp.861-872,1992.Banno, Kishino, “Gaze point detection algorithm based on 3D position measurement of face and pupil,” IEICE Transactions, D-II, No.5, pp.861-872,1992. 向井,三谷,外川,“画像処理による視線方向検出手法,”第２回画像センシングシンポジウム講演論文集,pp.135-138,1996.Mukai, Mitani, Tokawa, “Gaze direction detection by image processing,” Proceedings of the 2nd Image Sensing Symposium, pp.135-138, 1996. 青山,山村,“一台のカメラによる顔と視線方向の推定,”電子情報通信学会技術報告書,PRU.95-233,pp.131-136,1996.Aoyama, Yamamura, “Estimation of Face and Gaze Direction Using One Camera,” IEICE Technical Report, PRU.95-233, pp.131-136, 1996. 堀場,李,井上,“画像処理による視線検出手法とその応用,”第40回システム制御情報学会研究発表講演会,pp.187-188,1996.HORIBA, Lee, Inoue, “Gaze detection method using image processing and its application,” 40th Annual Conference of the Institute of Systems, Control and Information Engineers, pp.187-188, 1996. 西内,柴田,高田,“画像処理による非接触視線検出法の研究,”日本機械学会論文集（C編）,64巻620号,pp121-127,1998.Nishiuchi, Shibata, Takada, "Study on non-contact gaze detection by image processing," Transactions of the Japan Society of Mechanical Engineers (C), Vol. 64, 620, pp 121-127, 1998. 竹上,後藤,“角膜反射像と虹彩輪郭情報を併用した視線検出法,”電子情報通信学会論文誌,D-Ｉ,vol.J82,pp.1295-1303,1999.Takegami, Goto, “Gaze detection method using corneal reflection image and iris contour information,” IEICE Transactions, D-I, vol.J82, pp.1295-1303, 1999.

本発明は、眼球運動を用いた視線入力コミュニケーションシステムの開発を課題として、ビデオカメラを用いて患者の顔画像を取得し、画像処理により非接触的に患者の視線方向検出を行うことにより、表示画面上の意図する項目を選択するとともに、眼球運動と瞼の開閉動作による眼球機能のみで入力し、スイッチング操作で仮想ボードによる操作により、在宅勤務を行うようにした眼球運動を用いた視線入力コミュニケーションシステムの構築を目的とする。 The present invention has an object of developing a gaze input communication system using eye movements, acquires a patient's face image using a video camera, and performs non-contact detection of the patient's gaze direction by image processing. Gaze input communication using eye movement that selects intended items on the screen and inputs only by eye function by eye movement and eyelid opening and closing operation and working from home by virtual board operation by switching operation The purpose is to build a system.

上記目的を達成するために、第１の発明では、被験者の顔全体をとらえた画像より被験者の目が開いている画像と閉じている状態の画像の各画素における値の差を計算し、この差を画素値としてもつ新しい画像を作成させて差画像を取得し、次いで該差画像から目の中心となる座標を求め、その座標から目と眉のテンプレートを登録し、該キャリブレーションを行う時、パソコン画面を数分割した領域を該被験者が各分割領域を見た方向別の画像を登録しておき、黒目の位置を画像処理手法により求め、視線方向検出のための各方向別の基準となる黒目と眉の相対距離を求めておくようにした。第１の発明を主体とする第２の発明では、高速テンプレートマッチングを用いて目の位置を連続的に追従しつつ、カメラのズームイン機能により目周辺を大きくとらえた画像で該キャリブレーションおよび視線方向を取得するようにした。 In order to achieve the above object, in the first invention, a difference between values of each pixel of an image in which the subject's eyes are open and an image in which the subject's eyes are closed is calculated from an image obtained by capturing the entire face of the subject. When a new image having a difference as a pixel value is created to obtain a difference image, and then the coordinates of the center of the eye are obtained from the difference image, the eye and eyebrow templates are registered from the coordinates , and the calibration is performed. The computer screen is divided into several regions, and the subject has registered an image according to the direction in which the subject has seen each divided region, the position of the black eye is obtained by an image processing method, and the direction-specific reference for gaze direction detection The relative distance between the black eyes and the eyebrows was calculated . In the second invention based on the first invention, the calibration and the line-of-sight direction are obtained using an image in which the periphery of the eye is captured by the zoom-in function of the camera while continuously tracking the eye position using high-speed template matching . To get .

第１の発明を主体とする第３の発明では、該キャリブレーション後、数分割した領域の該パソコン画面上に、視線ポインタを眼球と瞼の開閉動作のみの機能でマウスの代わりに入力し、スイッチング操作により、仮想キーボードによる操作を可能とした。 In the third invention based on the first invention, after the calibration, a line-of-sight pointer is input on the personal computer screen in several divided areas instead of the mouse with the function of opening and closing the eyeball and eyelid, Switching operation enables virtual keyboard operation.

第１の発明を主体とする第４の発明では、起動させたいアプリケーションを含む区画された画面表示を２秒以上注視することにより、注視していた該区画の領域を拡大し、該被験者の視線が３秒間以上同一方向に向けられていると画面のスクロール速度が高速化する方式を採用し、該高速スクロールによって起動させたい該アプリケーションを該表示画面中央付近まで移動した後、意識的な瞬きを行うことにより、該視線ポインタを該表示画面中央付近に位置するように移動した。 In the fourth invention based on the first invention, the area of the section that has been watched is enlarged by gazing at the sectioned screen display including the application to be activated for 2 seconds or more, and the subject's line of sight When the screen is directed in the same direction for 3 seconds or more, the screen scroll speed is increased . After moving the application to be activated by the high speed scroll to the vicinity of the center of the display screen, By doing so, the line-of-sight pointer was moved to be located near the center of the display screen .

本発明の眼球運動を用いた視線入力コミュニケーションシステムの開発により、ビデオカメラを用いて患者の顔画像を取得し、画像処理により非接触的に患者の視線方向検出を行うことにより、表示画面上の意図する項目を選択することができるコミュニケーションシステムが構築できた。さらに、眼球機能のみで入力し、汎用のＰＣソフトを使用してスイッチングで仮想ボードによる操作が可能になるとともに、ＡＬＳ患者などの在宅勤務が可能となる。 By developing a gaze input communication system using eye movement of the present invention, a face image of a patient is acquired using a video camera, and the gaze direction of the patient is detected in a non-contact manner by image processing. The communication system that can select the intended item has been constructed. Furthermore, it is possible to input by only the eyeball function, and to perform operation by a virtual board by switching using general-purpose PC software, and it is possible to work at home for an ALS patient or the like.

次に、本発明に係る眼球運動を用いた視線入力コミュニケーションシステムの実施形態について、実施例１については、図１〜図２１を参照しながら詳細に説明する。 Next, an embodiment of a line-of-sight input communication system using eye movement according to the present invention will be described in detail with reference to FIGS.

図１は本発明に係る眼球運動を用いた視線入力コミュニケーションシステムのハードウェア構成図、図２は視線入力式コミュニケーションシステムの概要図、図３はコミュニケーションスクリーン（初期画面）の一例として、９分割画面を示す正面図、図４は図３のコミュニケーションスクリーンにおいて「テレビ」が選ばれた場合のコミュニケーションスクリーンの一例を図４（a）に示し、さらに図４（a）の画面で「チャンネルを変えて」を選択した場合のコミュニケーションスクリーンの一例（図４（b））を示した図、図５はシステムの処理手順を示すフロー図、図６は個人識別手順を示すフロー図、図７は入力画像における(a)は顔全体を示し(b)は目周辺領域を示す拡大図、図８は差画像を示す図、図９は目の位置検出を示す図、図１０はサイズ別テンプレート画像を示す図、図１１は画像サイズ別テンプレートマッチングを示す図、図１２は登録されたテンプレート画像の例を示す図、図１３はマッチング結果を示す図、図１４は方向別画像相関法（method I）における９方向別テンプレート画像の一例を示す図、図１５は黒画素領域検出法（method II）における高速テンプレートマッチングによる黒目追従の一例を示す図、図１６はエッジ特徴点検出法（method III）における前処理の一例を示す図、図１７はエッジ特徴点検出法（method III）におけるソーベル・フィルタによるエッジ検出の一例を示す図、図１８はエッジ特徴点検出法（method III）における接点４点の検出の一例を示す図、図１９は被験者に対する視線方向検出実験における実験画面を示す図、図２０は被験者に対する視線方向検出実験における９方向視線検出結果の一例を示す図、図２１は被験者に対する視線方向検出実験における１２方向視線検出結果の一例を示す図である。 1 is a hardware configuration diagram of a gaze input communication system using eye movement according to the present invention, FIG. 2 is a schematic diagram of a gaze input type communication system, and FIG. 3 is a nine-division screen as an example of a communication screen (initial screen). FIG. 4 shows an example of the communication screen when “TV” is selected in the communication screen of FIG. 3, and FIG. 4A shows an example of the communication screen. Is a diagram showing an example of a communication screen when FIG. 4B is selected (FIG. 4B), FIG. 5 is a flowchart showing a processing procedure of the system, FIG. 6 is a flowchart showing a personal identification procedure, and FIG. 7 is an input image. (A) shows the entire face, (b) is an enlarged view showing the eye peripheral area, FIG. 8 is a view showing a difference image, FIG. 9 is a view showing eye position detection, FIG. Is a diagram showing template images by size, FIG. 11 is a diagram showing template matching by image size, FIG. 12 is a diagram showing examples of registered template images, FIG. 13 is a diagram showing matching results, and FIG. FIG. 15 is a diagram showing an example of a template image according to nine directions in the correlation method (method I), FIG. 15 is a diagram showing an example of black eye tracking by high-speed template matching in the black pixel region detection method (method II), and FIG. 16 is edge feature point detection. FIG. 17 is a diagram showing an example of preprocessing in the method (method III), FIG. 17 is a diagram showing an example of edge detection by a Sobel filter in the edge feature point detection method (method III), and FIG. 18 is an edge feature point detection method (method III). ) Is a diagram showing an example of detection of four contact points in FIG. 19, FIG. 19 is a diagram showing an experiment screen in a gaze direction detection experiment for a subject, and FIG. 20 is a subject It illustrates an example of a nine directions visual axis detection result of the gaze direction detection experiments against, FIG. 21 shows an example of 12 directions visual axis detection result of the visual line detection experiments on subjects.

さらに、図２２は視線方向の取得から仮想キーワードの使用またはアプリケーションの使用までのフロー図、図２３は視線ポインタ付近の画面領域が拡大されて画面中央に表示された画面図、図２４はポインタを起動させたいアプリケーション近傍まで移動させる画面図、図２５は表示画面中央のウインドウで位置確認を行い視線ポインタをアプリケーション上に移動した画面図、図２６は瞼を意識的に３秒以上閉じると選択したアプリケーションが移動する画面図、図２７は表示画面中央に起動させたいアプリケーションを選択するための選択決定領域を表示する画面図、図２８は起動させたいアプリケーションに視線を向けると、そのアプリケーションがモニタ中央付近に位置するように表示画面をスクロールする画面図、図２９はモニタ中央に常に選択決定領域が固定表示されており、起動させたいアプリケーションを選択決定領域内に移動するための画面図、図３０は瞼を意識的に３秒以上閉じると領域内のアプリケーションが起動するための画面図、図３１は初期画面、図３２はアプリケーションを含む区画を注視し選択するための画面図、図３３は選択された区画領域拡大した画面図、図３４はアプリケーションを向け表示画面をスクロールするための画面図、図３５は意識的な瞬きを行いポインタ表示する画面図、図３６はポインタをアプリケーション上に移動させ起動するための画面図、図３７は視線ポインタ近傍領域拡大法による測定結果図（１〜５回目の平均測定値）、図３８は画面スクロール法による測定結果図（１〜５回目の平均測定値）、図３９は分割領域拡大法による測定図（１〜５回目の平均測定値）、図４０はポインタ近傍領域拡大法による測定結果図（６〜１０回目の平均測定値）、図４１は画面スクロール法による測定結果図（６〜１０回目の平均測定値）、図４２は分割領域拡大法による測定図（６〜１０回目の平均測定値）である。
である。 Furthermore, FIG. 22 is a flowchart from the acquisition of the gaze direction to the use of the virtual keyword or the application, FIG. 23 is a screen diagram in which the screen area near the gaze pointer is enlarged and displayed in the center of the screen, and FIG. Figure 25 shows the screen for moving to the vicinity of the application to be activated. Figure 25 shows the screen for confirming the position in the window at the center of the display screen and moving the line-of-sight pointer over the application. Figure 26 shows that the eyelid is intentionally closed for 3 seconds or more. FIG. 27 is a screen diagram for displaying a selection determination area for selecting an application to be activated at the center of the display screen. FIG. 28 is a screen diagram for displaying an application to be activated. Figure 29 shows a screen for scrolling the display screen so that it is located nearby. A selection decision area is always fixedly displayed in the center, and a screen diagram for moving an application to be activated into the selection decision area. FIG. 30 shows that the application in the area is activated when the bag is consciously closed for 3 seconds or more. 31 is an initial screen, FIG. 32 is a screen diagram for gazing and selecting a partition including an application, FIG. 33 is a screen diagram in which the selected partition area is enlarged, and FIG. 34 is a screen for directing the application. Screen for scrolling, FIG. 35 is a screen for displaying a pointer with a conscious blink, FIG. 36 is a screen for moving and starting the pointer on the application, and FIG. FIG. 38 is a result diagram (1-5th average measurement value), FIG. 38 is a screen scroll method measurement result diagram (1-5th average measurement value), and FIG. 39 is divided. FIG. 40 is a measurement result diagram by the area enlargement method (average measurement values for the first to fifth times), FIG. 40 is a measurement result diagram by the pointer vicinity region expansion method (average measurement values for the sixth to tenth times), and FIG. 41 is a measurement result diagram by the screen scroll method. (6th to 10th average measurement value), FIG. 42 is a measurement diagram (6th to 10th average measurement value) by the divided region enlargement method.
It is.

まず、ＡＬＳ（Amyotrophic Lateral Sclerosis：筋萎縮性側策硬化症）は、国の特定疾患に指定される進行性神経疾患である。１０万人に５人程度の有病率で、そのうち９０％は中年期以降に発症している。男女比または性別比は、１:１.５でやや男性に多い。国内の患者は、４５００人程度である。１８７４年、フランスのシャルコー医師によって最初に定義付けされて以来、現在に至っても、治療法も、進行をおさえる医学的対処法も無いと言われている。症状が進行するに伴い運動神経が侵され、四肢筋、嚥下筋、呼吸筋の筋力低下と萎縮が進み、通常発症から４〜５年で完全な四肢麻痺となって、手足のみならず、身体全体の筋肉が麻痺し、言葉を発することもできなくなる。最終的には呼吸する筋肉も犯され、人工呼吸器がなければ生存できない状態になる。しかし、知能、感覚、眼球運動は正常であり、知的な創作活動は可能である。アメリカではメジャーリーグ野球選手のルー・ゲーリックが罹患したことからゲーリック病とも呼ばれており、また、イギリスの有名な宇宙物理学者ホーキング博士も３０年来の患者である。 First, ALS (Amyotrophic Lateral Sclerosis) is a progressive neurological disease designated as a specific disease in the country. The prevalence is about 5 out of 100,000 people, 90% of which have developed since middle age. The gender ratio or gender ratio is 1: 1.5, which is slightly higher for men. There are about 4500 patients in Japan. Since it was first defined in 1874 by Dr. Charcot in France, it is said that there is no cure or medical treatment that can keep it going. As the symptom progresses, motor nerves are affected, muscle weakness and atrophy of limb muscles, swallowing muscles, and respiratory muscles progress, and complete limb paralysis occurs in 4 to 5 years from the onset of symptoms, not only the limbs but also the body The whole muscle is paralyzed and cannot speak. Eventually, the muscles that breathe are also violated, and without a ventilator, they cannot survive. However, intelligence, sensation, and eye movement are normal, and intelligent creative activities are possible. In the United States, major league baseball player Lou Gehrig was affected, and so it is called Gaelic disease. The famous British astrophysicist Dr. Hawking has also been a patient for 30 years.

ＡＬＳ患者２０は、手足の麻痺のため介護が必要となるが、病状の進行に伴い言葉が話せなくなってしまうと、医療従事者や介護者、家族とのコミュニケーションを図ることも困難になってくる。それゆえ、ＡＬＳ患者２０とのコミュニケーションが円滑にとれず、介護量が多くなり入院を断る医療機関もある。このため患者のＱＯＬ（Ｑｕａｌｉｔｙ・ｏｆ・Ｌｉｆｅ）を向上させるためのコミュニケーション機器の開発が望まれている。意思や情報の伝達に障害をもった人々が、残存機能を活用して、より円滑にコミュニケーションがおこなえるよう支援する器具や機器を総称してコミュニケーションエイドという。肢体不自由者が利用するコミュニケーションエイドの種類は、文字盤のような簡単な道具から、種々の工学技術を応用したハイテク機器まで広範囲に及ぶ。 The ALS patient 20 needs care because of paralysis of the limbs, but if it becomes impossible to speak language as the medical condition progresses, it becomes difficult to communicate with medical staff, caregivers, and families. . Therefore, there are some medical institutions that cannot communicate smoothly with the ALS patient 20 and refuse to be hospitalized due to a large amount of care. For this reason, development of the communication apparatus for improving a patient's QOL (Quality of Life) is desired. Equipment and devices that help people with disabilities to communicate intentions and information using the remaining functions to communicate more smoothly are called communication aids. The types of communication aids used by people with physical disabilities range from simple tools such as dials to high-tech devices that apply various engineering technologies.

本発明では、このようなＡＬＳ患者が、眼球運動を用いた視線入力コミュニケーションシステムの開発を目指した。ビデオカメラを用いて患者の顔画像を取得し、画像処理を用いて非接触に患者の視線方向検出を行う。検出された視線方向を用いて、患者がディスプレイ内のどの位置を見ているか識別し、それによって意図する項目を選択することができる視線入力コミュニケーションシステムの構築を目的とした。 In the present invention, such an ALS patient aimed to develop a gaze input communication system using eye movement. A face image of the patient is acquired using a video camera, and the eye direction of the patient is detected in a non-contact manner using image processing. The purpose of the present invention is to construct a gaze input communication system that uses the detected gaze direction to identify which position in the display the patient is looking at and to select an intended item.

図１は本発明に係る眼球運動を用いた視線入力コミュニケーションシステムのハードウェア構成図である。図１において、パソコン（以下、ＰＣと呼び、表示画面１１と演算処理装置１８の両方の機能を含む）の表示画面１１の上部に取り付けられたビデオカメラ１２で取り込んだＡＬＳ患者（使用者または被験者）２０の顔面の映像から、演算処理装置１８によりＡＬＳ患者２０の目の位置を決定する。視線方向の算出は、事前に決められているＰＣ上の基準点を見ている時のいくつかの点を初期値として記憶する。また、演算処理装置１８により決定された目の位置と初期値として記憶している目の位置から眼球の方向を決定し、これらに基づいて視線の方向を算出する。算出された視線方向は演算処理装置１８に与えられる。画像取り込み装置１４を経由して得られた小型のビデオカメラ１２の画像から、患者の頭の位置ずれが検出された場合には、演算処理装置１８は頭のずれを補正するためにビデオカメラ１２の上下左右の補正値を、カメラ制御装置１６に送り、その補正値分の移動をビデオカメラ１２に与える。演算処理装置１８は、視線方向に対応した方向に、ＰＣ上に表示されているカーソルを移動する。しかも、意識的に瞬きまたは注視をすることで、通常のＰＣ用のマウスを使ってクリックするのと同じ機能を、目でクリックしてコマンドを送信することが可能である。また、画像取り込み装置１４によって得られた計算結果より、ＡＬＳ患者２０の見ている領域を推定し、その領域の色を変更し表示（出力）することができる。室内の照明条件は通常の蛍光灯による明るさのみで十分であり、赤外線や特別な照明などを設置する必要はない。 FIG. 1 is a hardware configuration diagram of a line-of-sight input communication system using eye movement according to the present invention. In FIG. 1, an ALS patient (user or subject) captured by a video camera 12 attached to the top of a display screen 11 of a personal computer (hereinafter referred to as a PC, including the functions of both the display screen 11 and the arithmetic processing unit 18). ) The position of the eyes of the ALS patient 20 is determined by the arithmetic processing unit 18 from the image of the 20 face. In the calculation of the line-of-sight direction, several points when the reference points on the PC determined in advance are viewed are stored as initial values. The direction of the eyeball is determined from the eye position determined by the arithmetic processing unit 18 and the eye position stored as the initial value, and the direction of the line of sight is calculated based on these. The calculated line-of-sight direction is given to the arithmetic processing unit 18. When a positional deviation of the patient's head is detected from the image of the small video camera 12 obtained via the image capturing device 14, the arithmetic processing unit 18 corrects the deviation of the head by using the video camera 12. Are sent to the camera control device 16, and a movement corresponding to the correction values is given to the video camera 12. The arithmetic processing unit 18 moves the cursor displayed on the PC in a direction corresponding to the line-of-sight direction. Moreover, by blinking or gazing consciously, it is possible to send a command by clicking with the same function as clicking with a normal PC mouse. In addition, the region viewed by the ALS patient 20 can be estimated from the calculation result obtained by the image capturing device 14, and the color of the region can be changed and displayed (output). The indoor lighting conditions are sufficient only with the brightness of a normal fluorescent lamp, and there is no need to install infrared rays or special lighting.

図２は視線入力式コミュニケーションシステムの説明図であり、本システムは、主として一台のＰＣとビデオカメラ１２により構成される、非接触型コミュニケーションシステムである。ＡＬＳ患者２０等を対象としたシステムであり、ベッド上で使用されることが想定されている。ＰＣ、ビデオカメラ１２共に市販の製品を用いており、比較的安価なシステムを実現している。ＡＬＳ患者等は、ＰＣの表示画面１１上の区画された領域を目で注視することにより、意図する項目を選択することができる（図２参照）。表示画面１１は、設置が容易である液晶ディスプレイを用いると良い。また、システムの使用開始時にキャリブレーションを行う必要がある。キャリブレーション（方向基準画像の登録）では、表示画面１１の区画を順次点滅（他と異なる色に変更）させて行き、ＡＬＳ患者２０にそれを目で追ってもらう。その時の眼球の位置などを記録し、方向決定用の基準としている。 FIG. 2 is an explanatory diagram of a line-of-sight input communication system. This system is a non-contact communication system mainly composed of a single PC and a video camera 12. This system is intended for ALS patients 20 and the like, and is assumed to be used on a bed. Commercially available products are used for both the PC and the video camera 12, and a relatively inexpensive system is realized. An ALS patient or the like can select an intended item by gazing at the partitioned area on the display screen 11 of the PC (see FIG. 2). The display screen 11 may be a liquid crystal display that is easy to install. Moreover, it is necessary to perform calibration at the start of use of the system. In the calibration (registration of the direction reference image) , the sections of the display screen 11 are sequentially blinked (changed to a color different from the others), and the ALS patient 20 follows it. The position of the eyeball at that time is recorded and used as a reference for determining the direction.

図３はコミュニケーションスクリーン（初期画面）の一例として、９分割画面を示す正面図である。本発明は、特に重度のＡＬＳ患者２０も対象としているコミュニケーションシステムである。表示画面１１上は、図３のように９分割、または１２分割されており、各分割された領域には、ＡＬＳ患者２０にとって重要とされる表現が提示されている。この画面を、以下コミュニケーションスクリーンと呼ぶ。この提示されている意思項目は、ＡＬＳ患者２０の家族や病院に対して行ったアンケート結果を参考にして、決定されている。ＡＬＳ患者２０は、意図する項目を目で注視し選択することになる。２秒以上の注視が行われた時点で、ＰＣは注視であると認識し、その項目の選択を行う。選択された項目は、あらかじめ登録しておいた音声で読み上げるようにしてある。９分割や１２分割では、患者にとって必要とされる全ての表現を提示することはできないが、項目の下に予備項目を用意することで、より多くの表現の提示が可能となる。使用頻度が高いと考えられる、「はい」、「いいえ」は初期画面に表示されている。 FIG. 3 is a front view showing a nine-divided screen as an example of a communication screen (initial screen). The present invention is a communication system intended for particularly severe ALS patients 20. The display screen 11 is divided into nine or twelve as shown in FIG. 3, and expressions that are important for the ALS patient 20 are presented in each divided area. This screen is hereinafter referred to as a communication screen. The presented intention items are determined with reference to the results of a questionnaire conducted on the family of ALS patients 20 and hospitals. The ALS patient 20 selects and selects the intended item with his / her eyes. When the gaze for 2 seconds or more is performed, the PC recognizes that the gaze is being performed, and selects the item. The selected item is read aloud by voice registered in advance. In 9 divisions or 12 divisions, it is not possible to present all the expressions required for the patient, but more expressions can be presented by preparing spare items below the items. “Yes” and “No”, which are considered to be frequently used, are displayed on the initial screen.

図４は図３のコミュニケーションスクリーンにおいて「テレビ」が選ばれた場合のコミュニケーションスクリーンの一例を図４（a）に示し、さらに図４（a）の画面で「チャンネルを変えて」を選択した場合のコミュニケーションスクリーンの一例（図４（b））を示した図であり、例えば、初期画面で「テレビ」という項目を選んだとすると、次の画面には図４（ａ）のように、ＴＶに関する項目が表示される。なお、「戻る」と「メニューへ」の二つの項目に関しては、初期画面以外において常に表示されるようになっている。そして、ここで「チャンネルを変えて」を選択すると、図４（ｂ）が表示される。通常ＴＶのチャンネルは１２チャンネルまで用いることが多いので、「次へ」を選択すると７から１２までが表示される。患者が、この中から希望する番号（ここでは５）を選択すると“チャンネル５に変えてください。”と音声で読み上げるようになっている。 FIG. 4 shows an example of the communication screen when “TV” is selected in the communication screen of FIG. 3, and FIG. 4A shows a case where “change channel” is selected on the screen of FIG. 4A. For example, if the item “TV” is selected on the initial screen, the next screen shows items related to TV as shown in FIG. 4A. Is displayed. Note that the two items “return” and “go to menu” are always displayed on the screen other than the initial screen. When “change channel” is selected here, FIG. 4B is displayed. Normally, the TV channel is often used up to 12 channels, so when “Next” is selected, 7 to 12 are displayed. When the patient selects a desired number (5 in this case), the voice is read out as “Please change to channel 5.”

図５はシステムの処理手順を示すフロー図である。まず、顔全体をとらえた画像より、ＡＬＳ患者２０（または被験者）に目の開閉をしてもらい、差画像により目の位置を検出（１００）する。つぎに、複数のＡＬＳ患者２０が一つのシステムを共用する場合は、テンプレートマッチング（画像相関）を用いた個人認証（１０２）を行い、個人ごとに必要な設定を選択することができる。目の位置検出後、カメラのズームイン（１０４）機能により、目周辺を大きくとらえた画像を取得する。次に、目と眉のテンプレートを登録（１０６）し、キャリブレーション（１０８）を行う。この時、方向別に画像を登録しておき、黒目の位置等を以後提案する画像処理手法により求め、視線方向検出のための各方向別の基準となる値（黒目と眉の相対距離）を求めておく。キャリブレーションの後、入力される画像に対して、提案手法により視線方向検出（１１０）を行う。キャリブレーション時に得たデータ（黒目位置等）と入力画像より得られたデータを比較することにより、視線方向は検出される。これにより、ＡＬＳ患者２０は画面表示１１内の意図する項目を目で見つめることにより、選択することが可能となる。ＡＬＳ患者２０の見ている領域は、色を変えて出力することにより確認できる。意図する項目を選択する時は、その項目の領域を２秒以上注視することにより可能となる。もし、選択された項目（１１２）の下に予備項目が存在する場合は、画面が切り替わり予備項目が表示される。最終的に選択された項目は、音声で項目の内容を読み上げる（１１４）ようにしてある。 FIG. 5 is a flowchart showing the processing procedure of the system. First, the ALS patient 20 (or subject) opens and closes the eyes from the image that captures the entire face, and the position of the eyes is detected (100) from the difference image. Next, when a plurality of ALS patients 20 share one system, personal authentication (102) using template matching (image correlation ) is performed, and necessary settings can be selected for each individual. After detecting the position of the eyes, the camera zooms in (104) function to acquire an image that captures a large area around the eyes. Next, eye and eyebrow templates are registered (106), and calibration (108) is performed. At this time, an image is registered for each direction, and the position of the black eye is obtained by the proposed image processing method, and a reference value (relative distance between the black eye and the eyebrows) for each direction for detecting the gaze direction is obtained. Keep it. After calibration, line-of-sight detection (110) is performed on the input image by the proposed method. The line-of-sight direction is detected by comparing the data (black eye position etc.) obtained during calibration with the data obtained from the input image. As a result, the ALS patient 20 can make a selection by staring at an intended item in the screen display 11. The region viewed by the ALS patient 20 can be confirmed by changing the color and outputting. When selecting an intended item, it is possible to watch the area of the item for 2 seconds or longer. If a spare item exists under the selected item (112), the screen is switched and the spare item is displayed. For the finally selected item, the content of the item is read out by voice (114).

以上に述べた画像処理手法を用いて、個人認証（１０２）を行う。個人認証（１０２）を行う目的としては、視線検出においては個人差があるため、各個人のデータを登録しておき、ＡＬＳ患者２０ごとに個人認証を行い瞬時に被験者２０のデータを取り出すことである。また、セキュリティーの分野においても、顔画像による個人認証（１０２）は、今後重要な役割を果たすと考えられる。 Using the image processing method described above, personal authentication (102) is performed. The purpose of performing personal authentication (102) is that there are individual differences in line-of-sight detection, so that each individual's data is registered, personal authentication is performed for each ALS patient 20, and the subject 20's data is instantly extracted. is there. In the field of security, personal authentication (102) based on facial images will play an important role in the future.

図６に、個人認証の手順を示した。被験者は、目の開閉が行えるものとする。まず、目の開閉を行い、差画像により目の位置検出（１２０）を行う。ここで、あらかじめ登録されているテンプレート画像によりマッチング（１２２）を行い、個人識別（１２４）する。各個人に対して、暗証番号が決められており、目の開閉で暗証番号（１２６）を入力する。ここで言う暗証番号とは、目の開閉の順番であり、例えば「右、左、左、右」のように交互に目の開閉を行う。暗証番号が正しく入力されれば、個人認識は完了（１２８）となる。 FIG. 6 shows a procedure for personal authentication. Subjects shall be able to open and close their eyes. First, the eyes are opened and closed, and eye position detection (120) is performed based on the difference image. Here, matching (122) is performed using a template image registered in advance, and personal identification (124) is performed. A password is determined for each individual, and the password (126) is input by opening and closing the eyes. The personal identification number referred to here is the order of opening and closing the eyes. For example, the eyes are opened and closed alternately like “right, left, left, right”. If the personal identification number is correctly input, the personal recognition is completed (128).

テンプレートマッチングを用いて顔画像認識を行う場合、画像内に含まれる領域が髪などを含むと、時間と共に変化しやすく、顔画像認識が困難となってくると考えられる。そこで、テンプレート画像内に含む領域を、「目＋眉」、「目＋眉＋鼻」、「目＋眉＋鼻＋口」、「目＋眉＋鼻＋口＋頬輪郭」の４つのパターン（図１０）を用意して、マッチングを行った。「目＋眉」および「目＋眉＋鼻領域」を含んだテンプレートマッチングにおいて、本人に対する相関値が、０.９９を超える高い値であることが確認された（図１１）。また、「目＋眉＋鼻＋口」、「目＋眉＋鼻＋口＋頬輪郭」を含んだテンプレートマッチングについては、相関値は最も本人が高くなっており有用ではあるが、残りの２つと比べて相関値が低かった。よって、「目＋眉」または「目＋眉＋鼻領域」を含んだテンプレートマッチングが最適と考えられる。 When face image recognition is performed using template matching, if the region included in the image includes hair or the like, it is likely to change with time, and face image recognition becomes difficult. Therefore, the regions included in the template image are four patterns of “eyes + brows”, “eyes + brows + nose”, “eyes + brows + nose + mouth”, “eyes + brows + nose + mouth + cheek contour”. (FIG. 10) was prepared and matched. In template matching including “eye + brow” and “eye + brow + nose region”, it was confirmed that the correlation value for the person was a high value exceeding 0.99 (FIG. 11). For template matching including “eye + eyebrow + nose + mouth” and “eye + eyebrow + nose + mouth + cheek contour”, the correlation value is the highest and useful, but the remaining 2 The correlation value was lower than the one. Therefore, template matching including “eyes + eyebrows” or “eyes + eyebrows + nose area” is considered optimal.

「目＋眉＋鼻領域」を含んだテンプレート（標準画像）を用いて、１０人の被験者２０に対して、一人につき１０枚のテンプレート画像（８０×６０）を用意し、テンプレート画像と各個人ごとの入力画像とのマッチングを行い、相関値より個人識別を行った結果、個人の識別が可能であることが判明した。テンプレートマッチングと、目の開閉による暗証番号の入力により、高い確率で個人の認証が可能であった。 Using a template (standard image) including “eye + eyebrow + nose area”, 10 template images (80 × 60) per person are prepared for 10 subjects 20, and the template image and each individual As a result of matching with each input image and identifying the individual from the correlation value, it was found that the individual can be identified. It was possible to authenticate individuals with a high probability by template matching and input of a personal identification number by opening and closing eyes.

本発明においては、ビデオカメラ１２より取り込んだ画像をＰＣ内で処理することにより、ＡＬＳ患者２０の視線方向の検出を行い、コミュニケーションシステムの操作に利用する。特に、重度のＡＬＳ患者２０を対象としたコミュニケーションシステムの構築を目標としており、ＡＬＳの症状により、ＡＬＳ患者２０の頭は大きくは動かないものと想定して視線方向検出を行うのが妥当である。また、病院内や自宅室内で使用することを目的としており、通常の蛍光灯等の照明条件下で適切に動作することも重要となってくる。本発明においては、照明条件は通常の室内の蛍光灯による明かりのみによって視線検出が行えることが望ましいと考え、赤外線の利用や特別に蛍光灯等の光源を増やすようなことはしなかった。照明条件を設定しない報告（非特許文献６）がされているが、検出精度の低さに問題が残る。 In the present invention, the line-of-sight direction of the ALS patient 20 is detected by processing the image captured from the video camera 12 in the PC and used for the operation of the communication system. In particular, the goal is to establish a communication system for severe ALS patients 20, and it is appropriate to detect the direction of gaze assuming that the head of the ALS patient 20 does not move greatly due to symptoms of ALS. . It is also intended for use in hospitals and homes, and it is important to operate properly under normal lighting conditions such as fluorescent lamps. In the present invention, it is desirable that the sight line detection can be performed only with the light from a normal indoor fluorescent lamp, and the use of infrared rays or a special increase in the light source such as a fluorescent lamp is not performed. Although there is a report (Non-Patent Document 6) that does not set the illumination condition, there remains a problem in the low detection accuracy.

本発明の視線方向検出精度としては、ＰＣの表示画面１１内を数分割し、各領域を正しく選択できることを目的としている。これらの条件を実現するための視線方向検出手法として、テンプレートマッチングを用いた方向別画像相関法、黒画素領域検出法および虹彩領域のエッジに着目したエッジ特徴点検出法の３つの方法を検討した。眼球には、固視微動と呼ばれる細かい動きがあるため、一点を注視しているときでも、視線はその方向から０.３度程度ずれることが知られている（山田, 福田,“画像における注視点の定義と画像分析への応用,”電子通信学会論文誌,D-II,No.9,pp.1335-1342,1986.）。しかし、今回のシステムにおいて、指定された領域内の注視においては、０.３度のずれは視線方向検出に誤差を生むとは考えにくく、考慮しないことにした。視線方向検出手法は、方向別画像相関法（methodＩ）、黒画素領域検出法（methodII）およびエッジ特徴点検出法（methodIII）の３つの手法について検討した。 The visual line direction detection accuracy of the present invention is intended to divide the display screen 11 of the PC into several parts and to select each region correctly. We examined three methods: eye-direction image correlation method using template matching, black pixel region detection method, and edge feature point detection method focusing on the edges of the iris region, as the gaze direction detection methods for realizing these conditions. . It is known that the eyeball has a fine movement called fixation fixation micromotion, so that even when one point is being watched, the line of sight is shifted by about 0.3 degrees from that direction (Yamada, Fukuda, “Notes in the image”). Definition of viewpoint and application to image analysis, “The Transactions of the IEICE, D-II, No. 9, pp. 1335-1342, 1986.” However, in this system, it is unlikely that a shift of 0.3 degrees will cause an error in the detection of the gaze direction when gazing in the designated area, so it is not considered. As the gaze direction detection method, three methods were examined: a direction-specific image correlation method (method I), a black pixel region detection method (method II), and an edge feature point detection method (method III).

方向別画像相関法（methodＩ）の場合、キャリブレーション時に、目周辺画像を方向別にテンプレートとして登録しておいた（図１４）。目の位置は、ピラミッド構造を用いた高速テンプレートマッチングによって検出した。検出された目の位置に、登録しておいた方向別テンプレートを用いてマッチングを行い、最も高い相関を与える方向画像より視線方向を決定した。 In the case of the image correlation method for each direction (method I), the eye peripheral image is registered as a template for each direction at the time of calibration (FIG. 14). The eye position was detected by high-speed template matching using a pyramid structure. Matching was performed on the detected eye position using the registered template for each direction, and the line-of-sight direction was determined from the direction image giving the highest correlation.

黒画素領域検出法（methodII）においては、瞳孔を含む虹彩領域（以下、虹彩領域と略記）をテンプレート画像として登録し（６０×６０pixel）、目の位置を拡大した画像（入力画像）に対してテンプレートマッチングを行い、虹彩領域の位置を決定した（図１５）。 In the black pixel region detection method (method II), an iris region including a pupil (hereinafter abbreviated as “iris region”) is registered as a template image (60 × 60 pixels), and an image with an enlarged eye position (input image) is registered. Template matching was performed to determine the position of the iris region (FIG. 15).

エッジ特徴点検出法（methodIII）においては、眼球内の虹彩領域（黒目）と白目領域及び瞼との輝度の変化に着目し、エッジ検出を用いて視線方向を検出した。エッジ検出を容易にするために、画像強調、メディアン・フィルタによる平滑化を行い、ソーベル・フィルタによりエッジの検出を行った。 In the edge feature point detection method (method III), attention is paid to changes in luminance between the iris region (black eye), the white eye region, and the eyelid in the eyeball, and the line-of-sight direction is detected using edge detection. In order to facilitate edge detection, image enhancement and smoothing with a median filter were performed, and edges were detected with a Sobel filter.

頭部の位置補正を行うことによって、methodII,methodIIIにおいて検出精度が大幅に改善された（図２０および図２１）。MethodＩにおいては、マッチングの相関値により視線方向検出を行っているため、頭部位置補正は困難であるが、９，１２方向共に平均正答率は８０％を超えている。 By correcting the position of the head, the detection accuracy in method II and method III was greatly improved (FIGS. 20 and 21). In Method I, since the gaze direction is detected based on the correlation value of matching, it is difficult to correct the head position, but the average correct answer rate exceeds 80% in both the 9th and 12th directions.

重度のＡＬＳ患者２０は、手、口等が自由に動かせなくなり、第３者との間でコミュニケーション障害に陥る。ＡＬＳ患者２０の残存機能のひとつである、眼球運動を用いたコミュニケーション支援システムの構築が本発明の目的である。また本発明では、ＡＬＳ患者２０に対する負担が最も少ない、非接触型コミュニケーションシステムの開発を目的とした。市販のビデオカメラ１２とＰＣのみを用いることにより、安価なシステムの構築も目的である。 Severe ALS patient 20 cannot move hands, mouths, etc. freely and falls into communication with third parties. An object of the present invention is to construct a communication support system using eye movement, which is one of the remaining functions of the ALS patient 20. The present invention also aims to develop a non-contact communication system that has the least burden on the ALS patient 20. It is also an object to construct an inexpensive system by using only a commercially available video camera 12 and a PC.

実施例１では、ビデオカメラ１２によりＡＬＳ患者２０の代わりに学生を被験者２０として顔画像を撮影し、ＰＣに取り込み種々の画像処理を行った。入力画像のサイズは、３２０×２４０pixelであり、２５６階調のＲＧＢ画像である（図７）。ビデオカメラ１２は、ズーム機能を有しており、取り込み画像の拡大率は自由に設定することができる。 In Example 1, a face image was taken with a video camera 12 as a subject 20 instead of an ALS patient 20 and taken into a PC for various image processing. The size of the input image is 320 × 240 pixels, which is an RGB image with 256 gradations (FIG. 7). The video camera 12 has a zoom function, and the enlargement ratio of the captured image can be freely set.

目の位置検出の方法を、以下に示した。入力された画像内における目の位置は、瞼の開閉に着目し、連続画像間で輝度の変化の著しいところを求めることにより決定した。次に、記録された画像と過去に記録された画像を比較した。すなわち、各画素における値の差を計算し、この差の値を画素値としてもつ新しい画像を生成させた。以後、これを差画像とよぶ。 The method of eye position detection is shown below. The position of the eyes in the input image was determined by focusing on the opening and closing of the eyelids and by finding the place where the luminance change was significant between successive images. Next, the recorded image and the image recorded in the past were compared. That is, a difference in values at each pixel is calculated, and a new image having the difference value as a pixel value is generated. Hereinafter, this is referred to as a difference image.

この差画像を用いた目の位置検出手順を説明する。
（１）画像を垂直方向にＹ分割する。上部からｍ番目(１≦ｍ≦Ｙ)の分割領域を、水平分割領域Ｈｍと表現する（図８）。
（２）水平分割領域Ｈｍ上の画素値を合計し、それらをＨｍ＃ＳＵＭとする。
（３）最も大きなＨｍ＃ＳＵＭを与える水平分割領域のｍ番目の値を、目の垂直位置として採用する。
（４）採用された水平分割領域Ｈｍ内で、２つの大きな値を持つ連続した区域を検出する。この際、値が０でない連続区域を採用することも考えられるが、検出時のビデオカメラ１２等の雑音の影響を避けるために、ある値以上をもつ連続区域を採用した。
（５）それぞれの区域の中心または中心をｎ１，ｎ２として、これを目の水平位置として採用した。以上の工程により、得られた位置(ｎ１，ｍ)および(ｎ２，ｍ)に被験者２０の目が存在することになる（図９）。 An eye position detection procedure using this difference image will be described.
(1) The image is Y-divided in the vertical direction. The m-th (1 ≦ m ≦ Y) divided region from the top is expressed as a horizontal divided region Hm (FIG. 8).
(2) The pixel values on the horizontal division area Hm are summed and set as Hm # SUM.
(3) The mth value of the horizontal division region that gives the largest Hm # SUM is adopted as the vertical position of the eye.
(4) In the adopted horizontal division area Hm, two consecutive areas having large values are detected. At this time, it is conceivable to use a continuous area whose value is not 0, but in order to avoid the influence of noise of the video camera 12 or the like at the time of detection, a continuous area having a certain value or more was adopted.
(5) The center or center of each area is defined as n1, n2, and this is adopted as the horizontal position of the eye. Through the above steps, the eyes of the subject 20 are present at the obtained positions (n1, m) and (n2, m) (FIG. 9).

差画像によって目の位置が発見された後、高速テンプレートマッチングを用いて、目の位置を連続的に追従した。この操作は、目の位置追跡に必要な画像処理である、ピラミッド構造を用いた高速テンプレートマッチングを用いて行った。目開閉の判定は、２値化法を用いた。 After the eye position was found by the difference image, the eye position was continuously tracked using high-speed template matching. This operation was performed using high-speed template matching using a pyramid structure, which is image processing necessary for eye position tracking. The binarization method was used for the determination of eye opening / closing.

以上に述べた画像処理手法を用いて、個人認証を行った。個人認証を行う目的としては、視線検出においては個人差があるため各個人のデータを登録しておき、被験者２０ごとに個人認証を行い、瞬時に被験者２０のデータを取り出すことである。また、セキュリティーの分野においても、顔画像による個人認証は、今後、重要な役割を果たすと考えられる。 Personal authentication was performed using the image processing method described above. The purpose of performing personal authentication is to register each individual's data because there is an individual difference in line-of-sight detection, to perform individual authentication for each subject 20, and to instantly retrieve the data of the subject 20. In the field of security, personal authentication using facial images will play an important role in the future.

すでに前述したが、図６による個人認証の手順を示す。まず、目の開閉を行い、差画像により目の位置検出（１２０）を行った。ここで、あらかじめ登録されているテンプレート画像によりマッチング（１２２）を行い、個人識別を行った。各個人に対して、暗証番号が決められており、目の開閉で暗証番号（１２６）を入力する。ここで言う暗証番号とは、左右の目の開閉の順番であり、例えば「右、左、左、右」のように交互に目の開閉を行うことにより入力した。暗証番号が正しく入力されれば、個人認識は完了となる。 As described above, the personal authentication procedure according to FIG. 6 is shown. First, the eyes were opened and closed, and eye position detection (120) was performed based on the difference image. Here, matching (122) was performed using a template image registered in advance, and personal identification was performed. A password is determined for each individual, and the password (126) is input by opening and closing the eyes. The security code here refers to the order of opening and closing the left and right eyes, and is input by alternately opening and closing the eyes such as “right, left, left, right”. If the personal identification number is correctly input, personal recognition is completed.

テンプレートマッチングを用いて顔画像認識を行う場合、画像内に含まれる領域が髪などを含むと、時間と共に変化しやすく認識が困難となってくることが考えられる。そこで、テンプレート画像内に含む領域を、「目＋眉」、「目＋眉＋鼻」、「目＋眉＋鼻＋口」、「目＋眉＋鼻＋口＋頬輪郭」の４つのパターンを用意して、マッチングを行った。サイズ別テンプレート画像例を図１０に示した。被験者２０は６人で、あらかじめ取得しておいた画像に対して、４つのパターンのサイズ別テンプレート画像を用いてマッチングを行った。図１１にマッチング結果を示した。 When performing face image recognition using template matching, if the region included in the image includes hair or the like, it is likely that the region is likely to change over time and recognition becomes difficult. Therefore, the regions included in the template image are four patterns of “eyes + brows”, “eyes + brows + nose”, “eyes + brows + nose + mouth”, “eyes + brows + nose + mouth + cheek contour”. Prepared and matched. An example of a template image by size is shown in FIG. There were six test subjects 20, and matching was performed on images acquired in advance using four patterns of size-specific template images. FIG. 11 shows the matching result.

「目＋眉」および「目＋眉＋鼻」領域を含んだテンプレートマッチングにおいて、本人に対する相関値が、０.９９を超える事が確認された。また、「目＋眉＋鼻＋口」、「目＋眉＋鼻＋口＋頬輪郭」を含んだテンプレートマッチングについては、相関値は、本人が最も高くなっており有用ではあるが、「目＋眉」および「目＋眉＋鼻」領域を含んだテンプレートマッチングと比べて、相関値が低いことがわかった。よって、「目＋眉」または「目＋眉＋鼻」領域を含んだテンプレートマッチングが最適と考えられる。 In the template matching including the “eye + brow” and “eye + brow + nose” regions, it was confirmed that the correlation value for the person exceeded 0.99. For template matching including “eye + eyebrow + nose + mouth” and “eye + eyebrow + nose + mouth + cheek contour”, the correlation value is the highest and useful for the person himself. It was found that the correlation value was low compared to the template matching including the “+ brow” and “eye + brow + nose” regions. Therefore, template matching including the “eye + brow” or “eye + brow + nose” region is considered optimal.

画像サイズ別テンプレートマッチング認識実験において最適とされた「目＋眉＋鼻」領域を含んだテンプレートを用いて、個人の識別が可能であるかを実験によって検討した。１０人の被験者２０に対して、一人につき１０枚のテンプレート画像(８０×６０)を用意し、入力画像に対してマッチングを行い、相関値より個人識別を行った。図１２に、登録されたテンプレート画像の例を示した。 Experiments were conducted to determine whether an individual can be identified using a template including the “eye + brow + nose” region, which was optimized in the template matching recognition experiment for each image size. Ten template images (80 × 60) were prepared for 10 subjects 20 per person, matching was performed on input images, and individual identification was performed based on correlation values. FIG. 12 shows an example of a registered template image.

以上に示したような、テンプレート画像（８０×６０）と入力画像（３２０×２４０）とのマッチングによる、個人差比較結果の例を図１３に示した。縦軸は、入力画像とテンプレート画像との相関値であり、１に近い程２つの画像は相似であるということになる。また横軸は、用意されたテンプレート画像１００枚（１０枚／人、１０人分）である。あらかじめ登録しておいた各個人に対して、１０枚のテンプレート画像と各個人ごとの入力画像とのマッチングを行った。図１３の結果から分かるように、今回のテンプレートマッチングにおいて、本人であると認識することは可能であった。テンプレートマッチングと、片目の開閉による暗証番号の入力によって、より高い確率で個人の認証が可能であるという結果が得られた。 FIG. 13 shows an example of an individual difference comparison result obtained by matching the template image (80 × 60) and the input image (320 × 240) as described above. The vertical axis indicates the correlation value between the input image and the template image. The closer the value is to 1, the more similar the two images are. The horizontal axis represents 100 prepared template images (10 sheets / person, 10 persons). For each individual registered in advance, 10 template images were matched with the input image for each individual. As can be seen from the result of FIG. 13, it was possible to recognize the person himself in the template matching this time. The result is that it is possible to authenticate individuals with a higher probability by template matching and the input of a personal identification number by opening and closing one eye.

実施例２では、ビデオカメラ１２より取り込んだ画像をＰＣ内で処理することにより、実施例１同様にＡＬＳ患者２０の代わりに学生に被験者２０として視線方向の検出を行い、コミュニケーションシステムの操作に利用した。特に、重度のＡＬＳ患者２０を対象としたコミュニケーションシステムの構築を目標としており、ＡＬＳの症状により患者２０の頭は動かないものと想定して視線方向検出を行うのが妥当である。また、病院内や自宅室内で使用することを目的としており、通常の蛍光灯等の照明条件下で適切に動作することも重要となってくる。 In the second embodiment, the image captured from the video camera 12 is processed in the PC, and the gaze direction is detected as the subject 20 by the student instead of the ALS patient 20 as in the first embodiment, and used for the operation of the communication system. did. In particular, the goal is to establish a communication system for severe ALS patients 20, and it is appropriate to detect the direction of gaze on the assumption that the patient's 20 head does not move due to symptoms of ALS. It is also intended for use in hospitals and homes, and it is important to operate properly under normal lighting conditions such as fluorescent lamps.

本発明の視線方向検出精度としては、ＰＣの表示画面１１内を数分割し、各領域を正しく選択できることを目的としている。これらの条件を実現するための視線方向検出手法として、テンプレートマッチングを用いた方向別画像相関法、黒画素領域検出法および虹彩領域のエッジに着目したエッジ特徴点検出法の３つの方法を検討した。眼球には、固視微動と呼ばれる細かい動きがあるため、一点を注視しているときでも、視線はその方向から０.３度程度ずれることが知られている（山田,福田,“画像における注視点の定義と画像分析への応用,”電子通信学会論文誌,D-II,no.9,pp.1335-1342,1986.）。しかし、今回のシステムにおいて、指定された領域内の注視においては、０.３度のずれは視線方向検出に誤差を生むとは考えにくく、考慮しないことにした。 The visual line direction detection accuracy of the present invention is intended to divide the display screen 11 of the PC into several parts and to select each region correctly. We examined three methods: eye-direction image correlation method using template matching, black pixel region detection method, and edge feature point detection method focusing on the edges of the iris region, as the gaze direction detection methods for realizing these conditions. . It is known that the eyeball has a fine movement called fixational tremor, so even when one point is being watched, the line of sight is shifted by about 0.3 degrees from that direction (Yamada, Fukuda, “Notes in the image”). Definition of viewpoint and application to image analysis, “The Transactions of the IEICE, D-II, no.9, pp.1335-1342, 1986.” However, in this system, it is unlikely that a shift of 0.3 degrees will cause an error in the detection of the gaze direction when gazing in the designated area, so it is not considered.

黒画素領域検出法（methodII）においては、瞳孔を含む虹彩領域（以下、虹彩領域と略記）をテンプレート画像として登録し（６０×６０pixel）、目の位置を拡大した画像（入力画像）に対してテンプレートマッチングを行い、虹彩領域の位置を決定した（図１５）。計算時間短縮のために、ピラミッド構造（高木,下田,“画像解析ハンドブック,”東京大学出版会,1978.）を用いたテンプレートマッチングを行った。この時、頭部のわずかな動きによって虹彩領域の位置が変化し、視線方向検出に誤差が生じてくるため、眼球運動に対して変化の少ない眉左上に基準点をとった。従来は、この基準点の位置を、目と眉を含む全体を対象としていたため、眼球の動きに合わせて基準点が動いてしまうという問題を残していた。この改良により、基準点と虹彩の相対距離の変化によって、視線方向を検出することができた。 In the black pixel region detection method (method II), an iris region including a pupil (hereinafter abbreviated as “iris region”) is registered as a template image (60 × 60 pixels), and an image with an enlarged eye position (input image) is registered. Template matching was performed to determine the position of the iris region (FIG. 15). In order to shorten the calculation time, template matching using a pyramid structure (Takagi, Shimoda, “Image Analysis Handbook,” The University of Tokyo Press, 1978.) was performed. At this time, the position of the iris region changes due to a slight movement of the head, and an error occurs in the detection of the gaze direction. Therefore, the reference point was set at the upper left of the eyebrows with little change with respect to the eye movement. Conventionally, since the position of the reference point is for the whole including the eyes and eyebrows, the problem remains that the reference point moves in accordance with the movement of the eyeball. With this improvement, the line-of-sight direction could be detected by the change in the relative distance between the reference point and the iris.

図１５（b）において、黒目を含んだテンプレートの左上端座標を（Ｘ_ｅ，Ｙ_ｅ）とした。眉毛付近に取ったテンプレートの左上端座標を（Ｘ_ｂ，Ｙ_ｂ）とする。この２点間の距離を（Ｌ_ｘ，Ｌ_ｙ）とすると、以下のような式が得られる。 In FIG. 15 (b), the upper left corner coordinates of the template including the black eye are defined as (X _e , Y _e ). Let the upper left corner coordinates of the template taken near the eyebrows be (X _b , Y _b ). When the distance between the two points is (L _x , L _y ), the following equation is obtained.

キャリブレーション時に登録しておいた、方向別のＬ_ｘ ^（ｎ），Ｌ_ｙ ^（ｎ）（０＜ｎ≦Ｎ：Ｎは分割数）と比較し、最小の重みつきユークリッド距離を与えるｎ番目の項目を視線方向として採用した。しかし通常、我々が何かを目で追う時には、眼球運動ではなく頭部を動かして対象物を目で追っている事が多いと思われる。よって、表示画面１１上の区画された領域を追う時も、自然と頭が動いてしまう傾向にあった。本発明では重度の身障者を対象としており、頭部の大きな動きにより入力画像内から目や眉がはみ出してしまうことまでは考慮する必要はないが、頭部の微妙な動きによって視線方向の検出に誤差が生じる事は十分に考えられる。そこで、Ｌ_ｘ，Ｌ_ｙに以下の補正項を加えることにより、頭部の移動量を相殺した。補正項は以下のようにして求めた。 Compared to L _x ⁽ⁿ⁾ , L _y ⁽ⁿ⁾ (0 <n ≦ N: N is the number of divisions) registered for each direction and registered at the time of calibration, the n-th that gives the smallest weighted Euclidean distance The item was adopted as the gaze direction. However, when we follow something with eyes, it seems that we often follow the object by moving the head, not the eye movement. Therefore, when following a partitioned area on the display screen 11, the head tends to move naturally. In the present invention, it is intended for severely disabled people, and it is not necessary to take into account that the eyes and eyebrows protrude from the input image due to a large movement of the head, but it is possible to detect the line of sight by subtle movement of the head. It is quite possible that an error will occur. Therefore, the movement amount of the head is offset by adding the following correction terms to L _x and L _y . The correction term was obtained as follows.

ここで、Ｘｃ、Ｙｃは定数であり、実験的に求めるものである。なお（Ｘｂｏ、Ｙｂｏ）は、図１５（ａ）における眉毛左上基準点の初期座標で、（Ｘｂ、Ｙｂ）はその時点における入力画像の眉左上座標となる。 Here, Xc and Yc are constants and are obtained experimentally. Note (Xbo, YBO) is the initial coordinates of the eyebrows upper left reference point in FIG. 15 (a), (Xb, Yb) is the eyebrow upper left coordinates of the input image at that time.

エッジ特徴点検出法（methodIII）においては、眼球内の虹彩領域（黒目）と白目領域及び瞼との輝度の変化に着目し、エッジ検出を用いて視線方向を検出した。エッジ検出を容易にするために、画像強調、メディアン・フィルタによる平滑化を行い、ソーベル・フィルタによりエッジの検出を行った。画像化の過程で重畳される雑音を除去ないしは低減する手法として、平滑化（smoothing）がある。雑音とは本来緩やかに変化する部分における望ましくない急峻な濃度値の変化であり、急激な変化を滑らかに変換する平滑化は雑音を低減する効果がある。平滑化には、幾つかの方法が提案されているが、今回はメディアン・フィルタ（median filter）を用いた。メディアン・フィルタは局所平均化（長谷川,“画像処理の基本技法＜技法入門編＞,”技術評論社,1986.）よりも、（１）雑音除去の効果が大きい、（２）小さな変動を平滑化する、（３）エッジのボケの程度が少ない等の利点が挙げられる。 In the edge feature point detection method (method III), attention is paid to changes in luminance between the iris region (black eye), the white eye region, and the eyelid in the eyeball, and the line-of-sight direction is detected using edge detection. In order to facilitate edge detection, image enhancement and smoothing with a median filter were performed, and edges were detected with a Sobel filter. As a technique for removing or reducing noise superimposed in the process of imaging, there is smoothing. Noise is an undesirably steep change in density value in a portion that naturally changes gradually, and smoothing that smoothly converts the sudden change has an effect of reducing noise. Several methods have been proposed for smoothing, but this time a median filter was used. Median filter is more effective than local averaging (Hasegawa, “Basic Techniques of Image Processing <Technical Introduction>,” Technical Review, 1986.), (2) Smoothing small fluctuations. There are advantages such as (3) the degree of edge blurring is small.

画像上における、エッジ検出（edge detection）を目的とした画像処理手法で、ロバーツ（Roberts）やプレヴィト（Prewitt）等（土屋,深田,“画像処理,”コロナ社,
1990.）も利用できるが、今回はソーベル（Sobel）・フィルタを用いて虹彩領域と白目及び瞼との接辺の検出を行った。ディジタル画像では間隔の最小は１なので、微分の代わりに差分が用いられており、差分の絶対値はエッジの強さ、言い換えるとエッジらしさを数値化したものである。 An image processing method for edge detection on images, such as Roberts and Prewitt (Tsuchiya, Fukada, “Image Processing,” Corona,
1990.) can also be used, but this time we used the Sobel filter to detect the tangent between the iris area and the white and eyelids. Since the minimum interval is 1 in a digital image, a difference is used instead of differentiation, and the absolute value of the difference is a numerical value of edge strength, in other words, edge likeness.

目と眉を含んだ入力画像（３２０×２４０ pixel）より、目周辺領域の画像をテンプレートマッチングにより抜き出した。この時の画像サイズは１６０×８０pixelである（図１６(a)）。まず、エッジ検出を容易にするために、前処理として画像を強調し、その後３×３pixelのメディアン・フィルタを用いてエッジ情報を保存した平滑化を行い、虹彩領域と白目領域及び瞼との輝度の差を明確にした（図１６(b)）。この段階でエッジ検出を行うと、黒目と瞼下部の間に光の反射によって白い境界が作られてしまい、検出が困難となる。そこで、各画素を全て３倍とするコントラストの変換によって、肌色や白目領域はすべて白く変換した（図１６(c)）。また、肌色領域も白く変換することで虹彩と瞼との境界がより強調され、エッジ検出が容易となった。 From the input image (320 × 240 pixels) including the eyes and eyebrows, an image of the peripheral area of the eyes was extracted by template matching. The image size at this time is 160 × 80 pixels (FIG. 16A). First, in order to facilitate edge detection, the image is emphasized as a pre-process, and then the edge information is smoothed using a 3 × 3 pixel median filter, and the brightness of the iris region, the white-eye region, and the eyelids is increased. The difference was clarified (FIG. 16 (b)). If edge detection is performed at this stage, a white boundary is created by reflection of light between the black eye and the lower part of the eyelid, making detection difficult. Therefore, the skin color and the white-eye area were all converted to white by converting the contrast of each pixel to 3 times (FIG. 16 (c)). In addition, by converting the skin color region to white, the boundary between the iris and the eyelid is more emphasized, and the edge detection becomes easy.

次にソーベル・フィルタを水平方向、垂直方向に各々分けて用い、虹彩の左右両端及び、虹彩と瞼の接辺を検出した。この時、ソーベル・フィルタの出力値がプラス（白から黒）となる場合を青色で出力し、同様に出力値がマイナス（白から黒）となる場合を緑色で出力した（図１７）。これにより、虹彩の左右、虹彩と瞼の接辺の上下が区別される。図１７(a)より、斜め方向に近傍する１０ pixel程度の画素の合計が最も高い値を与える座標を、虹彩の左右両端Ｘ_Ｒ，Ｘ_Ｌとして採用した。次に図１７(b) に対して、既に得られたＸ_Ｒ，Ｘ_Ｌの中点に着目し、この中点の垂直方向上に、水平方向に近傍する１０pixel程度の画素の合計が最大値を与える点を、瞼との接点Ｙ_Ｕ，Ｙ_Ｄとして採用した（図１８）。ここで、黒画素領域検出法と同様に、眉毛を基準とした高速テンプレートマッチングを行い、得られたＸ_ｂ，Ｙ_ｂとの距離Ｌ_ｘＲ，Ｌ_ｘＬ，Ｌ_ＹＴ，Ｌ_ＹＢを求め、視線方向を検出した。 Next, the Sobel filter was used separately in the horizontal direction and the vertical direction to detect the left and right ends of the iris and the tangent side of the iris and eyelid. At this time, when the output value of the Sobel filter is positive (white to black), it is output in blue, and when the output value is negative (white to black), it is output in green (FIG. 17). As a result, the left and right sides of the iris and the top and bottom of the tangent side of the iris and the eyelid are distinguished. From FIG. 17 (a), the coordinates giving the highest value of the total of about 10 pixels adjacent in the oblique direction are adopted as the left and right ends X _R and X _L of the iris. Then with respect to FIG. 17 (b), focusing on the previously obtained X _R, the midpoint of X _L, on the vertical direction of the middle point, the maximum value sum of 10pixel about pixels neighboring in the horizontal direction The points to give are adopted as the contacts Y _U and Y _D with the heel (FIG. 18). Here, as with the black pixel region detection method, high-speed template matching is performed with the eyebrows as a reference, and distances L _xR , L _xL , L _YT , and L _YB with the obtained X _b and Y _b are obtained, and the line-of-sight direction Was detected.

４人の被験者２０に対して、視線方向検出実験を行った。４人の内訳としては、男性３名（１名は眼鏡装着）、女性１名となっている。被験者２０の目と表示画面１１の距離は７５cm、ビデオカメラ１２との距離は８５cmとした。室内の照明条件は通常の天井に備え付けられた蛍光灯による明るさで、実験中に大きな変化は無かった。表示画面１１内を９分割、１２分割に区画し（図１９）、各領域を見つめた時の眼球の動きをもとに、３種類の提案手法において、正しく視線方向の検出が可能であるかの確認を行った。まず、表示画面１１の中央を注視してもらい、基準となる画像を記録した。次にキャリブレーションとして、表示画面１１内の区画を順次点滅（他と異なる色に変更）させて行き、それを目で追ってもらい、方向決定用の基準となる画像の記録を行った。その後、約５０回ランダムに区画を点滅させて目で追う作業を繰り返してもらい、その時の顔画像と目の座標、視線方向を記録した。実験で得られた画像に対して、先に提案した３つの手法を用いて視線方向検出を試みた。３提案手法に対して再現性を期すために、全く同じ画像を用いることにした。 A gaze direction detection experiment was performed on four subjects 20. The breakdown of the four is three men (one wearing glasses) and one woman. The distance between the eyes of the subject 20 and the display screen 11 was 75 cm, and the distance from the video camera 12 was 85 cm. The indoor lighting conditions were the brightness of the fluorescent lamps installed on the normal ceiling, and there was no significant change during the experiment. Is the display screen 11 divided into 9 and 12 divisions (FIG. 19), and based on the movement of the eyeball when looking at each area, can the gaze direction be detected correctly with the three proposed methods? Was confirmed. First, the center of the display screen 11 was watched and a reference image was recorded. Next, as calibration, the sections in the display screen 11 were sequentially blinked (changed to a different color), followed by eyes, and an image serving as a reference for determining the direction was recorded. Thereafter, the task of blinking the compartments randomly about 50 times and repeating the eye tracking was repeated, and the face image, eye coordinates, and line-of-sight direction at that time were recorded. For the image obtained in the experiment, we tried to detect the gaze direction using the three methods proposed earlier. In order to ensure reproducibility with respect to the three proposed methods, the same image was used.

頭部の位置補正を行うことによって、methodII,methodIIIにおいて検出精度が大幅に改善されたことが、図２０および図２１より分かる。MethodＩにおいては、マッチングの相関値により視線方向検出を行っているため、頭部位置補正は困難であるが、９,１２方向共に平均正答率は８０％を超えている。被験者Ｔについて見てみると、９,１２方向共に補正項を加えなくても、平均正解率は３つの手法すべてにおいて９０％を越えていた。また、頭部の位置補正により、視線検出が正しく行えることが確認できたので、今後はシステムの対象を広げていくことが可能ではないかと考えられる。次に、提案手法ごとに考察を行った。方向別画像相関法（methodＩ）は、図２０(a)、図２１(a)において示されるように、１２方向の方が９方向より、わずかに視線検出精度が良いことが分かった。誤認識を見てみると、全てにおいて左右、上下のいずれか一マス誤って認識している場合がほとんどである。頭部の位置補正を行わずに、９方向、１２方向共に８５％以上の精度で視線方向の検出が行われている。MethodII,IIIと比べて、頭部の位置補正が困難なため、頭部位置補正後の正答率はあまりよくないが、補正をしなくても８５％以上の正答率を示している。黒画素領域検出法（methodII）は、９方向、１２方向共に頭部位置の補正を行うことにより、正答率は９０％以上となり改善されていることが分かる。この手法における、誤認識の大半は垂直方向となっている。垂直方向は、水平方向に比べて、眼球が瞼に隠れている領域が多いこともあり、黒画素領域（黒目）をテンプレートマッチングで追従するのが困難であると考えられる。エッジ特徴点検出法（methodIII）は、黒画素領域検出法とほとんど変わらない結果となっている。 It can be seen from FIGS. 20 and 21 that the detection accuracy is greatly improved in method II and method III by correcting the position of the head. In Method I, since the gaze direction is detected based on the correlation value of matching, it is difficult to correct the head position, but the average correct answer rate exceeds 80% in both the 9th and 12th directions. Looking at test subject T, the average accuracy rate exceeded 90% in all three methods without adding correction terms in both the 9 and 12 directions. Moreover, since it was confirmed that the line-of-sight detection can be performed correctly by correcting the position of the head, it may be possible to expand the scope of the system in the future. Next, we considered each proposed method. As shown in FIGS. 20 (a) and 21 (a), the direction-specific image correlation method (method I) was found to have slightly better gaze detection accuracy in 12 directions than in 9 directions. Looking at misrecognition, in most cases, one of the left, right, upper and lower is recognized incorrectly. The gaze direction is detected with an accuracy of 85% or more in both the 9 and 12 directions without correcting the position of the head. Compared with Method II and III, since the head position correction is difficult, the correct answer rate after the head position correction is not so good, but the correct answer rate is 85% or higher without correction. It can be seen that the black pixel area detection method (method II) is improved by correcting the head position in both the 9 and 12 directions to a correct answer rate of 90% or more. Most of the misrecognitions in this method are in the vertical direction. Compared to the horizontal direction, the vertical direction has more regions where the eyeball is hidden behind the eyelids, and it is considered difficult to follow the black pixel region (black eyes) by template matching. The edge feature point detection method (method III) is almost the same as the black pixel region detection method.

次に、実施例３に基づき図２２〜図４２を参照して説明する。なお、図５から図１８までの説明は前述しており、実施例３では、差画像による目の位置検出から視線方向の決定までのシステム処理手順を簡単に再説明を行った後、前記視線方向の決定以降の、表示画面１１上の視線ポインタをマウスの代わりに眼球機能（眼球運動と瞼の開閉動作のみ）を入力し、スイッチング操作により、いわゆる仮想キーボードによる操作が可能となるようにするため、ＡＬＳ患者のような重度肢体不自由者の眼球機能を用いた在宅勤務を可能とする本発明について説明する。 Next, based on Example 3, it demonstrates with reference to FIGS. 5 to 18 have been described above. In the third embodiment, the system processing procedure from the eye position detection based on the difference image to the determination of the line-of-sight direction is briefly described again, and then the line-of-sight is described. After the direction is determined, an eyeball function (only eye movement and eyelid opening / closing operation) is input instead of the mouse with the line-of-sight pointer on the display screen 11, and a switching operation enables a so-called virtual keyboard operation. Therefore, the present invention that enables telecommuting using the eyeball function of a severely handicapped person such as an ALS patient will be described.

特に、ＡＬＳ患者２０は、四肢に障害を持っているため、ＰＣを操作するような入力装置を利用する手段を有しない。言葉を発することが困難であるので音声入力によるＰＣ操作も不可能である。そこで、本発明ではＡＬＳ患者２０の残存機能である眼球と瞼の開閉運動をＰＣ操作の入力に応用することにより、汎用のＰＣソフトを使用して自在なＰＣ操作環境を実現し、ＡＬＳ患者２０のような重度肢体不自由者の在宅勤務が可能になった。なお、ここではＡＬＳ患者２０について述べたが、ＡＬＳ患者２０に限定する必要はなく、ＡＬＳ患者２０に類似した重度肢体不自由者についても同様の対応が可能となる。 In particular, since the ALS patient 20 has an obstacle in the extremities, the ALS patient 20 does not have means for using an input device for operating the PC. Since it is difficult to speak, PC operation by voice input is also impossible. Therefore, in the present invention, the open / close movement of the eyeball and eyelid, which is the remaining function of the ALS patient 20, is applied to the input of the PC operation, thereby realizing a free PC operation environment using general-purpose PC software. It became possible to work at home for people with severe physical disabilities such as In addition, although the ALS patient 20 was described here, it is not necessary to limit to the ALS patient 20, and the same correspondence is also possible for a severely physically handicapped person similar to the ALS patient 20.

このように、本発明では、ＡＬＳ患者２０の眼球機能を使用するのみで在宅勤務を行うことを主眼にしている。一般的にマウスを使用してＰＣに送られる入力信号を、本発明では、眼球と瞼の開閉運動により作成し送信した。これにより、表示画面１１上の視線ポインタをマウスの代わりに眼球機能（眼球運動と瞼の開閉動作のみ）を入力し、スイッチング操作で、いわゆる仮想キーボードによる操作が可能となる。すなわち、重度のＡＬＳ患者２０が眼球運動および瞬きしかできないのであれば、ビデオカメラ１２で逐次取り込む画像（入力画像）中で変化するのは眼球周辺のみである。そこで、入力画像で適切に設定した基準座標と眼球運動により変化する座標の相対距離と、入力画像と方向別登録画像の濃度誤差から視線方向を決定する。処理手順とその詳細な説明は図５から図２１までに詳細に記載してあるものと同様であるが、簡単に概要を説明しておく。 As described above, the present invention focuses on working from home only by using the eyeball function of the ALS patient 20. In the present invention, an input signal generally sent to a PC using a mouse is generated and transmitted by opening and closing movements of the eyeball and eyelid. As a result, an eye function (only eye movement and eyelid opening / closing operation) is input to the line-of-sight pointer on the display screen 11 instead of the mouse, and a so-called virtual keyboard can be operated by a switching operation. That is, if the severe ALS patient 20 can only move and blink, only the periphery of the eyeball changes in the images (input images) sequentially captured by the video camera 12. Therefore, the line-of-sight direction is determined from the relative distance between the reference coordinates appropriately set in the input image and the coordinates that change due to eye movement, and the density error between the input image and the direction-specific registered image. The processing procedure and the detailed description thereof are the same as those described in detail with reference to FIGS. 5 to 21, but the outline will be briefly described.

まず、背景を含んだ重度のＡＬＳ患者２０の顔画像から目の領域だけを抽出して拡大する。次に、黒目を含んだ１２０×８０画素の領域をテンプレートして登録する。テンプレート登録後、目の位置の自動追跡が可能となる。重度のＡＬＳ患者２０に瞼の開閉動作を５秒間行ってもらい開閉判別のための閾値を登録する。開閉閾値登録後、瞼の開閉状態の判別が可能となる。続いて、キャリブレーション後、入力画像に画像処理を施し、解析後得られた結果とキャリブレーション時に登録したデータを比較して、視線方向を検出する。視線方向の入力には瞼の開閉動作を用いた。瞼が閉じている状態が一定時間以上あると、意識的な瞬きと判断して瞼を閉じる直前の視線方向が採用される。 First, only the eye region is extracted from the face image of the severe ALS patient 20 including the background and enlarged. Next, an area of 120 × 80 pixels including black eyes is registered as a template. After template registration, the eye position can be automatically tracked. A severe ALS patient 20 is allowed to open and close the eyelid for 5 seconds, and a threshold value for opening / closing discrimination is registered. After opening / closing threshold registration, it is possible to determine the open / closed state of the bag. Subsequently, after calibration, the input image is subjected to image processing, the result obtained after analysis is compared with the data registered at the time of calibration, and the line-of-sight direction is detected. The eyelid direction was input using the eyelid opening / closing operation. If the eyelid is closed for more than a certain time, it is determined that the eye is consciously blinked, and the line-of-sight direction immediately before closing the eyelid is adopted.

その後の動作を引き続き、図２２を用いて説明する。図２２は視線方向の取得から仮想キーボードの使用またはアプリケーション（以下、アプリケーションとはＰＣ内部にインストールされている各種ソフトウェアのことを指す）の使用までフロー図である。 Subsequent operations will be described with reference to FIG. FIG. 22 is a flowchart from the acquisition of the line-of-sight direction to the use of a virtual keyboard or the use of an application (hereinafter, the application refers to various software installed in the PC).

図２２に示すように、視線方向決定（２００）を行った後、視線ポインタを移動（２０２）し、目標に到達したかどうかの判断を行い（２０４）、もし目標に到達していない場合は、再度視線ポインタの移動（２０２）を行う。そして、もし目標に到達していたと判断された場合は、眼の意識的な瞬き（２０６）を行い、その状態をコマンド送信（２０８）する。コマンド送信したものは、仮想キーボードの使用（２１０）とアプリケーションの使用（２１２）のいずれかに使用されるか、または両方を併用して使用することもできる。 As shown in FIG. 22, after the line-of-sight direction determination (200) is performed, the line-of-sight pointer is moved (202) to determine whether or not the target has been reached (204). Then, the line-of-sight pointer is moved again (202). If it is determined that the target has been reached, a conscious blink of eyes (206) is performed, and the state is transmitted as a command (208). The transmitted command can be used for either the use of the virtual keyboard (210) or the use of the application (212), or a combination of both.

仮想キーボードの使用例を簡単に述べる。まず、目標の文字の領域まで視線ポインタを移動させる。そして、目標の文字を入力するために意識的な瞬きを行う。すると目標の文字は仮想キーボードを介して目標の入力場所に文字が入力される。 An example of using a virtual keyboard is briefly described. First, the line-of-sight pointer is moved to the target character area. Then, a conscious blink is performed to input the target character. The target character is then input to the target input location via the virtual keyboard.

次に、アプリケーションの使用例を簡単に述べる。例としてディスクトップ上にマイクロソフト社のＩｎｔｅｒｎｅｔＥｘｐｌｏｒｅｒ（以下ＩＥ）を起動するまでの手順を示す。まず、視線ポインタをＩＥのアイコン上に移動させる。そして、ＩＥを起動させるために意識的な瞬きを行う。するとＩＥのアイコンは視線ポインタを介して起動コマンドを受け取り起動する。 Next, an application usage example will be briefly described. As an example, a procedure for starting up Microsoft Internet Explorer (hereinafter referred to as IE) on a desktop is shown. First, the line-of-sight pointer is moved over the IE icon. Then, a conscious blink is performed to activate the IE. Then, the IE icon is activated by receiving the activation command via the line-of-sight pointer.

まず、視線ポインタを用いた表示画面１１上のアプリケーション利用方法について述べる。本発明で提案する手法は、（１）ポインタ近傍領域拡大法、（２）画面スクロール法、（３）分割領域拡大法の３手法である。この３つの手法は、起動させたいアプリケーション近辺での操作性、選択までに要する時間がマウスポインタに比べて劣るという欠点を克服することを主な目的としている。 First, a method for using an application on the display screen 11 using a line-of-sight pointer will be described. The methods proposed in the present invention are three methods: (1) pointer vicinity region enlargement method, (2) screen scroll method, and (3) divided region enlargement method. These three methods are mainly intended to overcome the disadvantage that the operability in the vicinity of the application to be activated and the time required for selection are inferior to those of the mouse pointer.

まず、ポインタ近傍領域拡大法について述べる。ポインタ近傍領域拡大法の場合、ポインタが移動を行わない視線の先にある領域（画面中央領域）に、ポインタ近傍の領域を表示させ（図２３）、この表示画面１１でポインタの現在位置と起動させたいアプリケーションの位置を確認しながらポインタを操作（図２４、図２５）できるようにした。これにより、通常のマウスでポインタを操る作業と同様の微細な位置調整が実現できることになる。さらに、ポインタ近傍領域を拡大して画面中央に表示させることにより、視覚的にも視線ポインタとアプリケーションの位置が明確となった。そして、起動させたいアプリケーション上に視線ポインタが到達した後に、３秒以上の意識的な瞬きを行うことで、アプリケーションが起動するようにした（図２６）。
視線ポインタの移動時間は、移動距離に比例して大きくなり、マウス操作に比べてその差が顕著に現れてくる。そこで、ＡＬＳ患者２０が同一の方向に３秒間以上視線を向けると、視線ポインタの移動速度が高速化するようにした。これにより、遠く離れた位置への移動に要する時間の短縮が図れた。 First, the pointer vicinity area expansion method will be described. In the case of the pointer vicinity area enlargement method, an area in the vicinity of the pointer is displayed in an area ahead of the line of sight where the pointer does not move (screen central area) (FIG. 23). The pointer can be operated (FIGS. 24 and 25) while confirming the position of the application to be executed. Thereby, the fine position adjustment similar to the operation of operating the pointer with a normal mouse can be realized. Furthermore, the position of the line-of-sight pointer and the application became clear visually by enlarging the pointer vicinity area and displaying it in the center of the screen. Then, after the line-of-sight pointer reaches the application to be activated, the application is activated by performing a conscious blink for 3 seconds or more (FIG. 26).
The movement time of the line-of-sight pointer increases in proportion to the movement distance, and the difference becomes more noticeable compared to mouse operation. Therefore, when the ALS patient 20 turns the line of sight in the same direction for 3 seconds or more, the movement speed of the line-of-sight pointer is increased. As a result, the time required to move to a far away position can be shortened.

次に、画面スクロール法では、視線ポインタの移動が行われない画面中央まで、カーソル領域拡大法とは逆に視線ポインタではなく起動させたいアプリケーションを移動させて選択する。表示画面１１の中央には、起動させたいアプリケーションを選択するための領域（選択決定領域）を設け（図２７）、その領域内にそのアプリケーションを移動させる（図２８）。また、表示画面１１はスクロール式で常に画面全体がモニター上に表示されており（図２９）、カーソル領域拡大法に比べて使用できる画面領域が広くなっている。そして、起動させたいアプリケーションが選択領域上に到達した後に、３秒以上の意識的な瞬きを行うことで、アプリケーションが起動するようにした（図３０）。
さらに、本手法では起動させたいアプリケーションを表示画面１１中央に移動させるため移動時間はさほど問題とならない。しかし、アプリケーションの配置は通常画面の淵側が多いという現状も考慮して、ポインタ近傍領域拡大法と同様に、使用者の視線が３秒間以上同一方向に向けられていると画面のスクロール速度が高速化する方式を採用した。 Next, in the screen scroll method, to the center of the screen where the line-of-sight pointer is not moved, the application to be started is moved and selected instead of the line-of-sight pointer, as opposed to the cursor area expansion method. An area (selection determination area) for selecting an application to be activated is provided in the center of the display screen 11 (FIG. 27), and the application is moved within the area (FIG. 28). Further, the display screen 11 is a scroll type and the entire screen is always displayed on the monitor (FIG. 29), and the usable screen area is wider than that of the cursor area enlargement method. Then, after the application to be activated reaches the selection area, the application is activated by performing a conscious blink for 3 seconds or more (FIG. 30).
Furthermore, in this method, since the application to be started is moved to the center of the display screen 11, the moving time does not matter so much. However, considering the current situation that applications are usually placed on the far side of the screen, the screen scrolling speed is high when the user's line of sight is directed in the same direction for 3 seconds or more, as with the pointer vicinity area expansion method. Adopted a method to convert.

最後に、分割領域拡大法では、予め初期画面を９分割表示させておき（図３１）、ＡＬＳ患者２０が起動させたいアプリケーションを含む区画を注視して選択することにより行われる（図３２）。視線ポインタの移動時間問題を解決するために、起動させたいアプリケーション近傍領域のみを抽出しモニター全体に拡大表示させ（図３３）、表示画面１１の高速スクロールによって起動させたいアプリケーションを表示画面１１中央付近まで移動させた（図３４）。起動させたいアプリケーション近傍領域でのポインタ操作についても、そのアプリケーションがほぼ表示画面１１中央に位置し、近傍領域が９倍に拡大された表示画面１１上でポインタを表示させ作業を行うため、正確な移動操作が可能である（図３５、図３６）。 Finally, in the divided region enlargement method, the initial screen is displayed in nine divisions in advance (FIG. 31), and the section including the application that the ALS patient 20 wants to activate is watched and selected (FIG. 32). In order to solve the movement time problem of the line-of-sight pointer, only the application vicinity area to be activated is extracted and enlarged and displayed on the entire monitor (FIG. 33), and the application to be activated by high-speed scrolling of the display screen 11 is displayed near the center of the display screen 11 (Fig. 34). Regarding the pointer operation in the vicinity area of the application to be activated, the operation is performed by displaying the pointer on the display screen 11 in which the application is located substantially in the center of the display screen 11 and the vicinity area is enlarged nine times. Movement operation is possible (FIGS. 35 and 36).

以下に、分割領域拡大法の処理手順を示す。
（１）９区画に分割された初期画面が表示される。
（２）起動させたいアプリケーションを含む区画を注視する。注視時間が２秒以上であると、注視していた区画内の領域が拡大される。
（３）起動させたいアプリケーションに注視を向けると、表示画面１１がそのアプリケーションをモニタ中央付近に位置するようにスクロールする。
（４）起動させたいアプリケーションがモニター中央付近まで移動した後、意識的な瞬きを行うと視線ポインタが表示画面１１中央に表示される。
（５）視線ポインタを起動させたいアプリケーション上まで移動させ、瞼を３秒以上５秒未満意識的に閉じると目的アプリケーションが選択され起動する。但し、拡大前の表示画面に戻る場合は、５秒以上意識的に閉じる。 The processing procedure of the divided area expansion method is shown below.
(1) An initial screen divided into nine sections is displayed.
(2) Watch the section that contains the application you want to launch. If the gaze time is 2 seconds or more, the area in the section where the gaze is focused is enlarged.
(3) When paying attention to the application to be started, the display screen 11 scrolls so that the application is positioned near the center of the monitor.
(4) After the application to be activated has moved to the vicinity of the center of the monitor, when a conscious blink is performed, a line-of-sight pointer is displayed at the center of the display screen 11.
(5) Move the line-of-sight pointer to the application that you want to activate and consciously close the eyelid for 3 seconds or more and less than 5 seconds to select and activate the target application. However, when returning to the display screen before enlargement, close it intentionally for 5 seconds or more.

視線ポインタの実用性を検証するため、７人の健常者に対して以下のような実験を行った。各被験者２０に画面上のＩＥを、各手法を適用した視線ポインタにより起動してもらう。各被験者２０に対して各手法で１０回行い、ＩＥ起動の成否とＩＥ選択までの経過時間の測定を調査した（図３７、図３８、図３９）。我々が普段ＰＣを使用する場合、頻繁に起動させるアプリケーションは、そのショートカットを画面の左側に作成して表示させていることが多い。よって、アプリケーションの起動作業だけを考えるのであれば、視線ポインタの移動範囲は画面左側のみに限定できる。そこで、視線ポインタの初期表示位置を表示画面１１中央付近、ＩＥの表示位置を表示画面１１左上端付近に設定して調査を行った。本調査は、視線ポインタの問題点である、視線ポインタ移動時間のロス、起動させたいアプリケーション近傍領域での操作性の低さが、提案手法により改善されたかどうかの判別を目的としている。 In order to verify the practicality of the line-of-sight pointer, the following experiment was performed on seven healthy persons. Each subject 20 activates the IE on the screen by a line-of-sight pointer to which each method is applied. Each test subject 20 was performed 10 times by each method, and the success or failure of IE activation and the measurement of elapsed time until IE selection were investigated (FIGS. 37, 38, and 39). When we usually use PCs, applications that are frequently started often have their shortcuts created and displayed on the left side of the screen. Therefore, if only the application activation work is considered, the movement range of the line-of-sight pointer can be limited to only the left side of the screen. Therefore, the initial display position of the line-of-sight pointer was set near the center of the display screen 11 and the IE display position was set near the upper left corner of the display screen 11 for investigation. The purpose of this survey is to determine whether or not the proposed method has improved the loss of eye pointer movement time and the low operability in the vicinity of the application to be activated, which are problems with the eye pointer.

表１は視線ポインタによるＩＥ起動の成否を表す。表１により、本発明で提案する３手法を用いた視線ポインタでＩＥを起動させることは可能であると確認された。被験者２０の多くが視線ポインタの使用が初めてであったことを考慮すると、提案する視線ポインタの実用性は高いといえる。視線ポインタの課題であった操作性については、誤作動を起こすことなく被験者２０が意図する方向に移動していたことより、正確性が高いうえに直感的に分かり易く良好であったといえる。ＩＥ近傍領域での操作は、使用回数が増すにつれてスムーズな視線ポインタ移動がなされていた。このことは、実験回数６〜１０回目の視線ポインタ平均移動時間（図４１、図４２、図４３）が実験回数１〜５回目の視線ポインタ平均移動時間（図３８、図３９、図４０）より５秒程度短くなっていることからもわかる。 Table 1 shows success or failure of IE activation by the line-of-sight pointer. From Table 1, it was confirmed that IE can be activated with a line-of-sight pointer using the three methods proposed in the present invention. Considering that many of the test subjects 20 are using the gaze pointer for the first time, it can be said that the proposed gaze pointer is highly practical. The operability that was the subject of the line-of-sight pointer was high in accuracy and intuitive and easy to understand because the subject 20 moved in the intended direction without causing malfunction. In the operation in the vicinity of the IE, the line-of-sight pointer is smoothly moved as the number of uses increases. This means that the average visual pointer movement time (FIGS. 41, 42, and 43) for the 6th to 10th experiments is based on the average visual pointer movement time (FIGS. 38, 39, and 40) for the 1st to 5th experiments. It can also be seen from the fact that it is about 5 seconds shorter.

視線ポインタの移動時間については、３手法とも全被験者平均１０秒程度であり、通常のマウス操作に比べて遅いことは否めないが、アプリケーションを選択するうえでは支障がないといえる。また、一回の視線検出における視線ポインタの移動量を大きくすれば移動速度も必然的に上がる。本調査では、視線ポインタの使用が初めての被験者２０が多かったため移動量を小さくして移動速度を抑えたが、視線ポインタの使用に慣れるに従い移動速度を上げて、各被験者２０に最適な速度を設定することにより、さらなる快適な操作を実現できる。その場合、選択したいアプリケーション近傍領域での操作性を考慮して移動量を設定する必要がある。 Regarding the movement time of the line-of-sight pointer, all subjects have an average of about 10 seconds for all subjects, and it cannot be denied that it is slower than normal mouse operation, but it can be said that there is no problem in selecting an application. Further, if the amount of movement of the line-of-sight pointer in one line-of-sight detection is increased, the movement speed is inevitably increased. In this survey, because there were many subjects 20 who were using the gaze pointer for the first time, the movement amount was reduced and the movement speed was suppressed. However, the movement speed was increased according to the use of the gaze pointer, and the optimum speed for each subject 20 was set. By setting, more comfortable operation can be realized. In that case, it is necessary to set the movement amount in consideration of the operability in the vicinity of the application to be selected.

本発明に係る眼球運動を用いた視線入力コミュニケーションシステムのハードウェア構成図である。It is a hardware block diagram of the gaze input communication system using the eye movement which concerns on this invention. 視線入力式コミュニケーションシステムの概要図である。It is a schematic diagram of a gaze input type communication system. コミュニケーションスクリーン（初期画面）の一例として、９分割画面を示す図である。It is a figure which shows 9 division | segmentation screen as an example of a communication screen (initial screen). 図３のコミュニケーションスクリーンにおいて「テレビ」が選ばれた場合のコミュニケーションスクリーンの一例を図４（a）に示し、さらに図４（a）の画面で「チャンネルを変えて」を選択した場合のコミュニケーションスクリーンの一例（図４（b））を示した図である。An example of the communication screen when “TV” is selected in the communication screen of FIG. 3 is shown in FIG. 4A, and further when “Change channel” is selected on the screen of FIG. 4A. It is the figure which showed an example (FIG.4 (b)). システムの処理手順を示すフロー図である。It is a flowchart which shows the process sequence of a system. 個人識別手順を示すフロー図である。It is a flowchart which shows a personal identification procedure. 入力画像における(a)は顔全体を示し(b)は目周辺領域を示す拡大図である。(A) in the input image shows the entire face, and (b) is an enlarged view showing the eye peripheral region. 差画像を示す図である。It is a figure which shows a difference image. 目の位置検出を示す図である。It is a figure which shows the position detection of an eye. サイズ別テンプレート画像を示す図である。It is a figure which shows the template image classified by size. 画像サイズ別テンプレートマッチングを示す図である。It is a figure which shows the template matching according to image size. 登録されたテンプレート画像の例を示す図である。It is a figure which shows the example of the registered template image. マッチング結果を示す図である。It is a figure which shows a matching result. 方向別画像相関法（method I）における９方向別テンプレート画像の一例を示す図である。It is a figure which shows an example of the template image classified by 9 directions in the image correlation method classified by direction (method I). 黒画素領域検出法（method II）における高速テンプレートマッチングによる黒目追従の一例を示す図である。It is a figure which shows an example of the black eye tracking by the high-speed template matching in a black pixel area | region detection method (method II). エッジ特徴点検出法（method III）における前処理の一例を示す図である。It is a figure which shows an example of the pre-processing in an edge feature point detection method (method III). エッジ特徴点検出法（method III）におけるソーベル・フィルタによるエッジ検出の一例を示す図である。It is a figure which shows an example of the edge detection by a Sobel filter in the edge feature point detection method (method III). エッジ特徴点検出法（method III）における接点４点の検出の一例を示す図である。It is a figure which shows an example of the detection of four contact points in the edge feature point detection method (method III). 被験者に対する視線方向検出実験における実験画面を示す図である。It is a figure which shows the experiment screen in the gaze direction detection experiment with respect to a test subject. 被験者に対する視線方向検出実験における９方向視線検出結果の一例を示す図である。It is a figure which shows an example of the 9 direction gaze detection result in the gaze direction detection experiment with respect to a test subject. 被験者に対する視線方向検出実験における１２方向視線検出結果の一例を示す図である。It is a figure which shows an example of the 12 direction gaze detection result in the gaze direction detection experiment with respect to a test subject. 視線方向の取得から仮想キーワードの使用またはアプリケーションの使用までのフロー図である。It is a flowchart from acquisition of a gaze direction to use of a virtual keyword or use of an application. 視線ポインタ付近の表示画面領域が拡大されて画面中央に表示された画面図である。FIG. 10 is a screen diagram in which a display screen area near the line-of-sight pointer is enlarged and displayed at the center of the screen. 視線ポインタを起動させたいアプリケーション近傍まで移動させる画面図である。It is a screen figure which moves a gaze pointer to the application vicinity which wants to start. 表示画面中央のウインドウで位置確認を行い視線ポインタをアプリケーション上に移動した画面図である。It is the screen figure which confirmed the position in the window of the center of a display screen, and moved the gaze pointer on the application. 瞼を意識的に３秒以上閉じると選択したアプリケーションが移動する画面図である。It is a screen figure in which the selected application moves when the bag is consciously closed for 3 seconds or more. 表示画面中央に起動させたいアプリケーションを選択するための選択決定領域を表示する画面図である。It is a screen figure which displays the selection determination area | region for selecting the application which wants to be started at the center of a display screen. 起動させたいアプリケーションに視線を向けると、そのアプリケーションがモニタ中央付近に位置するように表示画面をスクロールする画面図である。FIG. 6 is a screen diagram that scrolls a display screen so that when an eye is directed to an application to be activated, the application is positioned near the center of the monitor. モニタ中央に常に選択決定領域が固定表示されており、起動させたいアプリケーションを選択決定領域内に移動するための画面図である。FIG. 5 is a screen diagram for constantly moving a selection determination area in the center of the monitor and moving an application to be activated into the selection determination area. 瞼を意識的に３秒以上閉じると領域内のアプリケーションが起動するための画面図である。It is a screen figure for starting the application in an area | region when consciously closing a bag for 3 seconds or more. 初期画面である。This is the initial screen. アプリケーションを含む区画を注視し選択するための表示画面図である。It is a display screen figure for paying attention and selecting a section containing an application. 選択された区画領域拡大した画面図である。It is the screen figure to which the selected division area was expanded. アプリケーションを向け画面をスクロールするための画面図である。It is a screen figure for turning an application and scrolling a screen. 意識的な瞬きを行いポインタ表示する画面図である。It is a screen figure which performs a conscious blink and displays a pointer. ポインタをアプリケーション上に移動させ起動するための画面図である。It is a screen figure for moving and moving a pointer on an application. ポインタ近傍領域拡大法による測定結果図（１〜５回目の平均測定値）である。It is a measurement result figure (1st-5th average measurement value) by the pointer neighborhood area expansion method. 画面スクロール法による測定結果図（１〜５回目の平均測定値）である。It is a measurement result figure (1st-5th average measurement value) by a screen scroll method. 分割領域拡大法による測定図（１〜５回目の平均測定値）である。It is a measurement figure (average measurement value of the 1st-5th time) by a division field expansion method. 視線ポインタ近傍領域拡大法による測定結果図（６〜１０回目の平均測定値）である。It is a measurement result figure (6th-10th average measured value) by a gaze pointer neighborhood field expansion method. 画面スクロール法による測定結果図（６〜１０回目の平均測定値）である。It is a measurement result figure (6th-10th average measurement value) by the screen scroll method. 分割領域拡大法による測定図（６〜１０回目の平均測定値）である。It is a measurement figure (6th-10th average measured value) by a division field expansion method.

Explanation of symbols

１１表示画面
１２ビデオカメラ
１４画像取り込み装置
１６カメラ制御装置
１８演算処理装置
２０ＡＬＳ患者（被験者） 11 Display Screen 12 Video Camera 14 Image Capture Device 16 Camera Control Device 18 Arithmetic Processing Device 20 ALS Patient (Subject)

Claims

The difference image is calculated by calculating the difference in the value of each pixel between the image in which the subject's eyes are open and the image in the closed state from the image that captures the entire face of the subject, and creating a new image with this difference as the pixel value. Then , the coordinates of the center of the eye are obtained from the difference image, the eye and eyebrow template is registered from the coordinates , and when the calibration is performed, the subject divides the area obtained by dividing the personal computer screen into several parts. The image for each direction in which the area was viewed was registered, the position of the black eye was obtained by the image processing method, and the relative distance between the black eye and the eyebrows that became the reference for each direction for detecting the direction of the eye was obtained. Eye-gaze input communication method using eye movement characterized by this.

2. The calibration and the line-of-sight direction are acquired with an image obtained by capturing a large area around the eyes by a zoom-in function of the camera while continuously tracking the eye position using high-speed template matching . A gaze input communication method using the described eye movement.

After the calibration, on the personal computer screen in several divided areas, a gaze pointer is input instead of a mouse with the function of opening and closing the eyeballs and eyelids, and a virtual keyboard can be operated by switching operation. The eye-gaze input communication method using eye movement according to claim 1.

By gazing at the partitioned screen display including the application to be activated for 2 seconds or more, the area of the section that was being watched is enlarged, and the screen of the subject is directed in the same direction for 3 seconds or more. By adopting a method of increasing the scrolling speed, moving the application to be activated by the high-speed scrolling to near the center of the display screen, and then performing a conscious blink, the gaze pointer is moved to the vicinity of the center of the display screen. eye control communication method using the eye movement as claimed in claim 1, wherein it has moved to be positioned.