JP6916849B2

JP6916849B2 - Information processing equipment, information processing methods and information processing programs

Info

Publication number: JP6916849B2
Application number: JP2019167394A
Authority: JP
Inventors: 綾塚　祐二; 祐二綾塚
Original assignee: CRESCO,Inc.
Current assignee: CRESCO,Inc.
Priority date: 2019-09-13
Filing date: 2019-09-13
Publication date: 2021-08-11
Anticipated expiration: 2039-09-13
Also published as: JP2021043881A

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来から、機械学習を用いて画像に写っている物体等を認識する画像認識技術がある。例えば、医療分野においては、人体を撮影したレントゲン画像またはＣＴ画像等の各種の医用画像から病変等を認識する技術がある。また、製造業の分野においては、製品を撮影した画像から製品に混入した異物、または製品の欠陥を認識する技術がある（例えば、特許文献１を参照）。 Conventionally, there is an image recognition technique that recognizes an object or the like in an image by using machine learning. For example, in the medical field, there is a technique for recognizing a lesion or the like from various medical images such as an X-ray image or a CT image of a human body. Further, in the field of the manufacturing industry, there is a technique for recognizing a foreign substance mixed in a product or a defect of the product from an image of the product taken (see, for example, Patent Document 1).

機械学習においては、予め人間によって判断された判断結果を教師データとして学習させるアノテーションが行われる場合がある。例えば、テキスト内容の認識におけるアノテーションでは、テキストを分類するための単語や熟語等のテキストデータが教師データとしてアノテーションされる。 In machine learning, annotations may be performed to learn the judgment result judged by a human in advance as teacher data. For example, in the annotation in recognizing the text content, text data such as words and idioms for classifying the text are annotated as teacher data.

また、画像認識におけるアノテーションでは、画像を分類するための画像データが教師データとしてアノテーションされる。例えば、画像に含まれる特定の物体の部分とそれ以外の部分を分類して認識する画像認識においては、特定の物体の画像部分を「正例」としてアノテーションし、またはそれ以外の画像部分を「負例」としてアノテーションする。アノテーションの結果をアノテーション結果という。機械学習の学習モデル（エンジン）は、正例のアノテーション結果、または負例のアノテーション結果に基づき、画像に含まれる特定の物体とそれ以外とを分類して認識することができる。 Further, in the annotation in image recognition, image data for classifying images is annotated as teacher data. For example, in image recognition that classifies and recognizes a specific object part and other parts included in an image, the image part of the specific object is annotated as a "normal example", or the other image part is ". Annotate as "negative example". The result of annotation is called the annotation result. The learning model (engine) of machine learning can classify and recognize a specific object included in an image and others based on the annotation result of a positive example or the annotation result of a negative example.

また、機械学習における認識精度を向上させるには、正確にアノテーションされた数多くの教師データが必要となる場合がある。特に近年の撮影画像の高画質化により、１画像あたりのデータ量が増加している。このため、数多くの高画質の教師データを解析するためには処理能力が高いコンピュータが要求される場合がある。 In addition, a large number of accurately annotated teacher data may be required to improve recognition accuracy in machine learning. In particular, the amount of data per image is increasing due to the improvement in image quality of captured images in recent years. Therefore, a computer with high processing power may be required to analyze a large amount of high-quality teacher data.

特開２０１８−０３２０７１号公報JP-A-2018-032071

例えば、医療分野において、撮影画像に含まれる病変等の部分をアノテーションする場合、アノテーションを行う人間は、撮影画像を見て病変等を認識することができる高度な専門性が要求される。このため、専門性が高い人間（例えば、医者）が不足する場合、アノテーションの作業において、専門性が高い人間に負荷が集中して、作業負荷が増大してしまう場合があった。 For example, in the medical field, when annotating a part such as a lesion included in a photographed image, the person performing the annotation is required to have a high degree of specialization in being able to recognize the lesion or the like by looking at the photographed image. For this reason, when there is a shortage of highly specialized humans (for example, doctors), the load may be concentrated on highly specialized humans in the annotation work, and the workload may increase.

また、機械学習による画像の分類結果が間違っている場合、人間が分類結果を手動で修正する必要がある。また、コンピュータの処理能力が十分でない場合、機械学習の処理に時間が掛かってしまう。このため、修正を行う人間は、機械学習の処理が終了するまで分類結果の修正作業をすることができないため作業性が低下する。作業性の低下は、アノテーションを行う人間の作業時間を増加させて、作業負荷が増大してしまう場合があった。 In addition, if the classification result of the image by machine learning is incorrect, it is necessary for a human to manually correct the classification result. Further, if the processing power of the computer is not sufficient, the machine learning process takes time. For this reason, the person who makes the correction cannot correct the classification result until the machine learning process is completed, and the workability is reduced. The decrease in workability may increase the work time of the person performing the annotation, resulting in an increase in the workload.

本発明は上記事情に鑑みてなされたものであり、アノテーションの作業負荷を軽減させることができる、情報処理装置、情報処理方法および情報処理プログラムを提供することを一つの目的とする。 The present invention has been made in view of the above circumstances, and one object of the present invention is to provide an information processing apparatus, an information processing method, and an information processing program capable of reducing the workload of annotation.

（１）上記の課題を解決するため、情報処理装置は、機械学習において処理される画像を表示して、画像に対して利用者によるアノテーションを可能にするユーザインタフェイス（ＵＩ）を提供する提供部と、提供部において提供されたＵＩに表示された画像に対して利用者が領域を指定したアノテーション結果を取得する取得部と、取得部において取得されたアノテーション結果を学習して、画像の分類を予測する予測部と、を備える情報処理装置であって、提供部は、第１画像を表示するＵＩを提供し、取得部は、第１画像に対して利用者が領域を指定した第１アノテーション結果を取得し、予測部は、取得部において取得された第１アノテーション結果に基づき第１画像分類を予測し、提供部は、予測部において予測された第１画像分類を含む第２画像を表示するＵＩを提供し、取得部は、第２画像に対して利用者が領域を指定した第２アノテーション結果を取得し、予測部は、取得部において取得された第２アノテーション結果をさらに学習して第２画像分類を予測し、提供部は、予測部において予測された第２画像分類を含む第３画像を表示するＵＩを提供する。 (1) In order to solve the above problems, the information processing device provides a user interface (UI) that displays an image processed in machine learning and enables the user to annotate the image. Classify images by learning the acquisition section that acquires the annotation result that the user specifies the area for the image displayed on the UI provided by the section and the providing section, and the annotation result acquired by the acquisition section. An information processing device including a prediction unit for predicting The annotation result is acquired, the prediction unit predicts the first image classification based on the first annotation result acquired by the acquisition unit, and the providing unit predicts the second image including the first image classification predicted by the prediction unit. The UI to be displayed is provided, the acquisition unit acquires the second annotation result in which the user specifies the area for the second image, and the prediction unit further learns the second annotation result acquired in the acquisition unit. The second image classification is predicted, and the providing unit provides a UI for displaying the third image including the second image classification predicted by the prediction unit.

（２）また、実施形態の情報処理装置において、取得部は、第２画像の中で第１画像分類に対して利用者が分類を修正して領域を指定した第２アノテーション結果を取得し、予測部は、第１画像分類の分類が修正された第２アノテーション結果をさらに学習して、第１画像分類を修正した第２画像分類を予測するものであってもよい。 (2) Further, in the information processing apparatus of the embodiment, the acquisition unit acquires the second annotation result in which the user modifies the classification for the first image classification in the second image and specifies the area. The prediction unit may further learn the second annotation result in which the classification of the first image classification is modified, and predict the second image classification in which the classification of the first image classification is modified.

（３）また、実施形態の情報処理装置において、取得部は、利用者が指定した領域のサイズを含むアノテーション結果を取得し、予測部は、サイズに応じて、画像の分類を予測する領域の大きさを変更するものであってもよい。 (3) Further, in the information processing apparatus of the embodiment, the acquisition unit acquires the annotation result including the size of the area specified by the user, and the prediction unit is the area for predicting the classification of the image according to the size. The size may be changed.

（４）また、実施形態の情報処理装置において、取得部は、サイズとして利用者が指定した領域の長さを含むアノテーション結果を取得するものであってもよい。 (4) Further, in the information processing apparatus of the embodiment, the acquisition unit may acquire the annotation result including the length of the area specified by the user as the size.

（５）また、実施形態の情報処理装置において、予測部は、アノテーション結果に基づき、利用者が指定した領域の特徴量を抽出し、抽出した特徴量と類似する領域の画像の分類を予測するものであってもよい。 (5) Further, in the information processing apparatus of the embodiment, the prediction unit extracts the feature amount of the area specified by the user based on the annotation result, and predicts the classification of the image of the area similar to the extracted feature amount. It may be a thing.

（６）また、実施形態の情報処理装置において、予測部は、利用者が指定した領域から、ピクセルの情報の平均値、ピクセルの情報の分散値、またはエッジ強調を施したピクセルの情報の分散値の少なくともいずれか１つに基づき特徴量を抽出するものであってもよい。 (6) Further, in the information processing apparatus of the embodiment, the prediction unit determines the average value of the pixel information, the dispersion value of the pixel information, or the dispersion of the pixel information with edge enhancement from the area specified by the user. The feature amount may be extracted based on at least one of the values.

（７）また、実施形態の情報処理装置において、予測部は、ピクセルの情報の平均値、ピクセルの情報の分散値、およびエッジ強調を施したピクセルの情報の分散値の設定されたそれぞれの重みに基づき特徴量を抽出するものであってもよい。 (7) Further, in the information processing apparatus of the embodiment, the prediction unit sets each weight of the average value of the pixel information, the dispersion value of the pixel information, and the dispersion value of the pixel information with edge enhancement. The feature amount may be extracted based on.

（８）上記の課題を解決するため、情報処理方法は、情報処理装置における情報処理方法であって、機械学習において処理される画像を表示して、画像に対して利用者によるアノテーションを可能にするユーザインタフェイス（ＵＩ）を提供する提供ステップと、提供ステップにおいて提供されたＵＩに表示された画像に対して利用者が領域を指定したアノテーション結果を取得する取得ステップと、取得ステップにおいて取得されたアノテーション結果を学習して、画像の分類を予測する予測ステップと、を含む情報処理方法であって、第１画像を表示するＵＩを提供するステップと、第１画像に対して利用者が領域を指定した第１アノテーション結果を取得するステップと、取得された第１アノテーション結果に基づき第１画像分類を予測するステップと、予測された第１画像分類を含む第２画像を表示するＵＩを提供するステップと、第２画像に対して利用者が領域を指定した第２アノテーション結果を取得するステップと、取得された第２アノテーション結果をさらに学習して第２画像分類を予測するステップと、予測された第２画像分類を含む第３画像を表示するＵＩを提供するステップと、を含む。 (8) In order to solve the above problems, the information processing method is an information processing method in an information processing device, and an image processed in machine learning is displayed so that the user can annotate the image. A provision step that provides a user interface (UI) to be processed, an acquisition step in which the user acquires an annotation result in which an area is specified for an image displayed on the UI provided in the provision step, and an acquisition step. An information processing method including a prediction step of learning an annotation result and predicting the classification of an image, the step of providing a UI for displaying the first image, and a user area for the first image. Provides a step of acquiring the first information processing result in which is specified, a step of predicting the first image classification based on the acquired first information processing result, and a UI for displaying the second image including the predicted first image classification. A step of acquiring a second information processing result in which the user specifies an area for the second image, a step of further learning the acquired second information processing result, and a step of predicting the second image classification. Includes a step of providing a UI for displaying a third image that includes the second image classification.

（９）上記の課題を解決するため、情報処理プログラムは、コンピュータに、機械学習において処理される画像を表示して、画像に対して利用者によるアノテーションを可能にするユーザインタフェイス（ＵＩ）を提供する提供機能と、提供機能において提供されたＵＩに表示された画像に対して利用者が領域を指定したアノテーション結果を取得する取得機能と、取得機能において取得されたアノテーション結果を学習して、画像の分類を予測する予測機能と、を実現させるための情報処理プログラムであって、提供機能は、第１画像を表示するＵＩを提供し、取得機能は、第１画像に対して利用者が領域を指定した第１アノテーション結果を取得し、予測機能は、取得機能において取得された第１アノテーション結果に基づき第１画像分類を予測し、提供機能は、予測機能において予測された第１画像分類を含む第２画像を表示するＵＩを提供し、取得機能は、第２画像に対して利用者が領域を指定した第２アノテーション結果を取得し、予測機能は、取得機能において取得された第２アノテーション結果をさらに学習して第２画像分類を予測し、提供機能は、予測機能において予測された第２画像分類を含む第３画像を表示するＵＩを提供する。 (9) In order to solve the above problems, the information processing program displays an image processed in machine learning on a computer and provides a user interface (UI) that enables the user to annotate the image. By learning the provided function to be provided, the acquisition function to acquire the annotation result in which the user specifies the area for the image displayed on the UI provided in the provided function, and the annotation result acquired in the acquisition function, It is an information processing program for realizing a prediction function for predicting the classification of images. The provided function provides a UI for displaying the first image, and the acquisition function is provided by the user for the first image. The first annotation result in which the area is specified is acquired, the prediction function predicts the first image classification based on the first annotation result acquired in the acquisition function, and the providing function predicts the first image classification predicted in the prediction function. The UI for displaying the second image including the above is provided, the acquisition function acquires the second annotation result in which the user specifies the area for the second image, and the prediction function is the second acquired in the acquisition function. The annotation result is further learned to predict the second image classification, and the providing function provides a UI for displaying the third image including the second image classification predicted by the prediction function.

本発明の一つの実施形態によれば、第１画像に対して利用者が領域を指定した第１アノテーション結果を取得し、取得された第１アノテーション結果に基づき第１画像分類を予測し、予測された第１画像分類を含む第２画像を表示するＵＩを提供し、第２画像に対して利用者が領域を指定した第２アノテーション結果を取得し、取得された第２アノテーション結果をさらに学習して第２画像分類を予測し、予測された第２画像分類を含む第３画像を表示するＵＩを提供することにより、アノテーションの作業負荷を軽減させることができる。 According to one embodiment of the present invention, the user acquires the first annotation result in which the area is specified for the first image, and predicts and predicts the first image classification based on the acquired first annotation result. A UI for displaying the second image including the first image classification is provided, the second annotation result in which the user specifies the area for the second image is acquired, and the acquired second annotation result is further learned. By providing a UI that predicts the second image classification and displays the third image including the predicted second image classification, the workload of annotation can be reduced.

実施形態における情報処理装置のソフトウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the software structure of the information processing apparatus in embodiment. 実施形態における情報処理装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the information processing apparatus in embodiment. 実施形態における情報処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware composition of the information processing apparatus in embodiment. 第１の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。It is a figure which shows the brush stroke (A) and the prediction (B) of an image classification in 1st Example. 第２の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。It is a figure which shows the brush stroke (A) and the prediction (B) of an image classification in the 2nd Example. 第３の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。It is a figure which shows the brush stroke (A) and the prediction (B) of an image classification in a 3rd Example. 第４の実施例における初期のアノテーションを示す図である。It is a figure which shows the initial annotation in 4th Example. 第４の実施例における画像分類の予測を示す図である。It is a figure which shows the prediction of the image classification in 4th Example. 第４の実施例における修正アノテーションを示す図である。It is a figure which shows the correction annotation in 4th Example. 第４の実施例における再予測された画像分類の予測を示す図である。It is a figure which shows the prediction of the repredicted image classification in the 4th Example. 第４の実施例における領域の区分を示す図である。It is a figure which shows the division of the area in 4th Example. 第５の実施例における、元画像（Ａ）、アノテーション結果（Ｂ）、および画像分類の予測（Ｃ）を示す図である。It is a figure which shows the original image (A), the annotation result (B), and the prediction (C) of an image classification in 5th Example.

以下、図面を参照して本発明の一実施形態における情報処理装置、情報処理方法および情報処理プログラムについて詳細に説明する。 Hereinafter, the information processing apparatus, the information processing method, and the information processing program according to the embodiment of the present invention will be described in detail with reference to the drawings.

先ず、図１を用いて、情報処理装置の機能を説明する。図１は、実施形態における情報処理装置のソフトウェア構成の一例を示すブロック図である。 First, the function of the information processing apparatus will be described with reference to FIG. FIG. 1 is a block diagram showing an example of a software configuration of an information processing device according to an embodiment.

図１において、情報処理装置１は、表示装置２、入力装置３と接続されている。また、情報処理装置１は、ネットワーク９を介してサーバ４と通信可能に接続されている。表示装置２は、機械学習において処理される画像に対して利用者によるアノテーションを可能にするユーザインタフェイス（ＵＩ）を表示する装置であって、例えば、液晶ディスプレイである。入力装置３は、情報処理装置１の利用者（以下、単に「利用者」という場合がある。）が、アノテーションにおける指示を入力する装置であって、例えば、キーボード、マウスまたはタッチパネル等である。サーバ４は、情報処理装置１とデータの通信を行う。例えば、サーバ４は、情報処理装置１で処理する画像データを提供し、または情報処理装置１で処理した処理結果を記憶するものであってもよい。なお、図１は、情報処理装置１を含むシステムの構成を例示しているが、実施形態におけるシステムの構成はこれに限定されるものではない。例えば、複数の情報処理装置１がサーバ４に接続されるシステムであってもよい。また、ネットワーク９は、有線または無線を介する通信経路であり、その通信プロトコル等の通信手段は任意である。 In FIG. 1, the information processing device 1 is connected to the display device 2 and the input device 3. Further, the information processing device 1 is communicably connected to the server 4 via the network 9. The display device 2 is a device that displays a user interface (UI) that enables annotation by a user on an image processed in machine learning, and is, for example, a liquid crystal display. The input device 3 is a device in which a user of the information processing device 1 (hereinafter, may be simply referred to as a “user”) inputs an instruction in annotation, and is, for example, a keyboard, a mouse, a touch panel, or the like. The server 4 communicates data with the information processing device 1. For example, the server 4 may provide image data to be processed by the information processing device 1 or store the processing result processed by the information processing device 1. Although FIG. 1 illustrates the configuration of the system including the information processing device 1, the configuration of the system in the embodiment is not limited to this. For example, it may be a system in which a plurality of information processing devices 1 are connected to the server 4. Further, the network 9 is a communication path via wire or wireless, and the communication means such as the communication protocol is arbitrary.

情報処理装置１は、提供部１１、取得部１２、記憶部１３、予測部１４および通信制御部１５の各機能部を有する。本実施形態における情報処理装置１の上記各機能部は、本実施形態における情報処理プログラム（ソフトウェア）によって実現される機能モジュールであるものとして説明する。 The information processing device 1 has each functional unit of a providing unit 11, an acquisition unit 12, a storage unit 13, a prediction unit 14, and a communication control unit 15. Each of the above-mentioned functional units of the information processing apparatus 1 in the present embodiment will be described as being a functional module realized by the information processing program (software) in the present embodiment.

提供部１１は、機械学習において処理される画像（被処理画像）に対して利用者によるアノテーションを可能にするＵＩを提供する。アノテーションとは、被処理画像に対する教師データを作成する利用者による作業であり、本実施形態においては、被処理画像の中で教師データとなる領域を２値または多値の分類（「画像分類」という場合がある。）において指定する作業をいう。アノテーションにおいては、指定された領域の画像の特徴量と、画像分類を示すタグとを結びつけるタグ付けが行われる。 The providing unit 11 provides a UI that enables the user to annotate the image (processed image) processed in machine learning. Annotation is a work by a user who creates teacher data for a processed image, and in the present embodiment, the area to be teacher data in the processed image is classified into binary or multi-valued (“image classification”). In some cases), it means the work specified in. In the annotation, tagging is performed to link the feature amount of the image in the specified area with the tag indicating the image classification.

ここで、画像の特徴量とは、例えば、ピクセルの情報の平均値、ピクセルの情報の分散値、またはエッジ強調を施したピクセルの情報の分散値である。ピクセルの情報とは、例えば、ピクセルのＲＧＢ値である。ＲＧＢ値は、例えば、赤、緑および青の各色を０〜２５５の数値で表す。ピクセルの情報は、他の色空間（例えば、ＹＭＣＫ値、ＨＳＶ値等）における各色の数値等の情報であってもよい。 Here, the feature amount of the image is, for example, an average value of pixel information, a dispersion value of pixel information, or a dispersion value of pixel information with edge enhancement. The pixel information is, for example, the RGB value of the pixel. The RGB values represent, for example, each color of red, green, and blue as a numerical value from 0 to 255. The pixel information may be information such as a numerical value of each color in another color space (for example, YMCK value, HSV value, etc.).

タグ付けとは、画像の特徴量に対して画像分類を示すメタデータを付加することである。例えば、被処理画像に含まれる特定の物体を認識するためのアノテーションにおいては、被処理画像の中の特定の物体の画像の領域を指定して、その領域の画像の特徴量に対して「正例」を示す画像分類のタグ付けを行う。また、それ以外の領域の画像の特徴量に対して「負例」を示す画像分類のタグ付けを行う。なお、以下の実施例においては、被処理画像に対して「正例」の領域と「負例」の領域の２値の画像分類を指定するアノテーションを行う場合を例示するが、多値の画像分類を指定するアノテーションの実施を排除するものではない。 Tagging is to add metadata indicating image classification to the feature amount of an image. For example, in the annotation for recognizing a specific object included in the processed image, a region of the image of the specific object in the processed image is specified, and the feature amount of the image in that region is "positive". Tag the image classification shown in "Example". In addition, tagging of image classification indicating a "negative example" is performed on the feature amount of the image in the other region. In the following embodiment, an annotation that specifies a binary image classification of a "positive example" region and a "negative example" region is performed on the image to be processed, but a multi-valued image is illustrated. It does not exclude the implementation of annotations that specify classification.

提供部１１において提供されるＵＩは、被処理画像に対して利用者が正例の領域（以下、「正例」と省略する場合がある。）または負例の領域（以下、「負例」と省略する場合がある。）の画像分類を指定するアノテーションを可能にする。例えば、画像の中で特定の物体を認識する画像認識の処理のアノテーションにおいては、その特定の物体の画像の領域が正例の画像分類として指定され、また、その物体以外の画像の領域が負例の画像分類として指定される。 The UI provided by the providing unit 11 is a region where the user is a positive example (hereinafter, may be abbreviated as "positive example") or a negative example area (hereinafter, "negative example") with respect to the image to be processed. May be omitted.) Enables annotation to specify the image classification. For example, in the annotation of the image recognition process for recognizing a specific object in an image, the image area of the specific object is designated as a positive image classification, and the image area other than the object is negative. Designated as an example image classification.

また、提供部１１は、被処理画像を表示するとともに、被処理画像に対して正例または負例の画像分類を指定するための指定ツールを含むＵＩを提供する。指定ツールとは例えば、ペイントアプリケーションにおけるペイントブラシと同様のツールである。ペイントアプリケーションにおいては、利用者は被処理画像の中でカーソルを移動させて、カーソルの軌跡を描画することができる。軌跡の太さは、例えば、カーソルの大きさまたは太さの設定等に応じて変更することができる。また、軌跡の長さはカーソルの移動距離に応じて変更することができる。ＵＩは、ペイントアプリケーションにおけるペイントブラシと同様の操作によって領域の画像分類を指定することを可能にする。領域の画像分類の指定は、例えば、表示装置２に表示された被処理画像に対して利用者がマウス等を操作することにより行われる。すなわち、利用者は、被処理画像の任意の領域に対して、ペイントブラシと同様のツールによる描画を行うことにより、正例または負例の分類を任意に指定することができる。また、一度指定した領域の分類は、例えばｕｎｄｏ等の取消操作によって取り消すことができるようにしてもよい。なお、以下の実施例においては、ＵＩにおいて提供される、ペイントアプリケーションにおけるペイントブラシと同様のツールを「ペイントブラシ」といい、また、「ペイントブラシ」によって描画されたオブジェクトを「オブジェクト」という場合がある。 In addition, the providing unit 11 displays the image to be processed and provides a UI including a designation tool for designating an image classification of a positive example or a negative example with respect to the image to be processed. The designated tool is, for example, a tool similar to a paint brush in a paint application. In the paint application, the user can move the cursor in the image to be processed and draw the trajectory of the cursor. The thickness of the locus can be changed, for example, according to the size or thickness of the cursor. Further, the length of the locus can be changed according to the moving distance of the cursor. The UI allows you to specify the image classification of an area by the same operation as a paint brush in a paint application. The designation of the image classification of the area is performed, for example, by the user operating a mouse or the like on the image to be processed displayed on the display device 2. That is, the user can arbitrarily specify the classification of positive and negative examples by drawing with a tool similar to the paint brush on an arbitrary area of the image to be processed. Further, the classification of the once designated area may be canceled by a cancel operation such as undo. In the following examples, a tool similar to the paint brush in the paint application provided in the UI may be referred to as a "paint brush", and an object drawn by the "paint brush" may be referred to as an "object". be.

また、提供部１１は、後述する予測部１４において予測された画像分類を含む画像を表示して、表示した画像に対して利用者による再度のアノテーションを可能にするＵＩを提供する。予測された画像分類とは、被処理画像の領域がいずれの画像分類に属するかを示す情報であり、予測された画像分類を含む画像とは、被処理画像に対して予測された画像分類を表示する画像である。予測された画像分類の表示は、例えば、被処理画像に対して、画像分類を示す情報を重畳することにより行うことができる。画像分類を示す情報とは、例えば、画像分類毎に色分けされた色の情報である。また、画像分類を示す情報は、画像分類毎に種類が分けられたハッチングの情報、または、画像分類毎に濃度が分けられたグレイスケールの情報等であってもよい。利用者はＵＩに表示された画像分類により、被処理画像の中でどの領域がどの画像分類に予測されたかを識別することが可能となる。例えば、予測された画像分類を色分けされた色の情報を重畳することにより表示する場合、正例を示す領域を赤色、負例を示す青色に塗り分けることにより、利用者は被処理画像の領域の中で正例として予測された領域と負例として予測された領域を識別することができる。なお、予測された画像分類を含む画像は提供部１１（ＵＩを含む）において生成されてもよく、また、予測部１４において生成されてもよく、生成の主体を限定しない。 In addition, the providing unit 11 displays an image including the image classification predicted by the prediction unit 14 described later, and provides a UI that enables the user to re-annotate the displayed image. The predicted image classification is information indicating which image classification the region of the processed image belongs to, and the image including the predicted image classification is the predicted image classification for the processed image. This is the image to be displayed. The predicted image classification can be displayed, for example, by superimposing information indicating the image classification on the image to be processed. The information indicating the image classification is, for example, color information color-coded for each image classification. Further, the information indicating the image classification may be hatching information in which the types are classified for each image classification, grayscale information in which the densities are divided for each image classification, and the like. The image classification displayed on the UI enables the user to identify which area in the processed image was predicted for which image classification. For example, when displaying the predicted image classification by superimposing color-coded color information, the user can use the area of the image to be processed by coloring the area showing the positive example in red and the area showing the negative example in blue. It is possible to distinguish between the region predicted as a positive example and the region predicted as a negative example. The image including the predicted image classification may be generated by the providing unit 11 (including the UI) or may be generated by the predicting unit 14, and the subject of generation is not limited.

予測された画像分類を含む画像には、画像分類が予測されていない領域が含まれていてもよい。すなわち、予測された画像分類を含む画像は、被処理画像の領域の中で画像分類が予測された領域と、画像分類が予測されていない領域とを利用者が識別可能に表示する。利用者は、画像分類が予測されていない領域に対して再度のアノテーションを行うことができるとともに、既に画像分類が予測されている領域に対して再度のアノテーションを行うことができる。 The image containing the predicted image classification may include a region where the image classification is not predicted. That is, in the image including the predicted image classification, the area in which the image classification is predicted and the area in which the image classification is not predicted are displayed so as to be distinguishable by the user. The user can re-annotate the area where the image classification is not predicted, and can re-annotate the area where the image classification is already predicted.

ここで、ＵＩにおいて先に表示される画像を第１画像、第１画像に対するアノテーションの結果を第１アノテーション結果、さらに第１アノテーション結果に基づく画像分類を第１画像分類という。また、第１画像分類を含み、第１画像の後にＵＩにおいて表示される画像を第２画像、第２画像に対する再度のアノテーションの結果を第２アノテーション結果、さらに第２アノテーション結果に基づく画像分類を第２画像分類という。また、第２画像分類を含み、第２画像の後にＵＩにおいて表示される画像を第３画像という。すなわち、提供部１１は、第１画像、第２画像および第３画像を表示するＵＩを提供する。なお、第１画像は第２画像より先に表示される画像であって、例えば、ｎ回目（ｎは整数）のアノテーション結果に基づく画像分類を含んでいてもよい。また、第２画像は第１画像より後に表示される画像であって、例えば、ｎ＋１回目のアノテーション結果に基づく画像分類を含んでいてもよい。同様に、第３画像は第２画像より後に表示される画像であって、例えば、ｎ＋２回目のアノテーション結果に基づく画像分類を含んでいてもよい。 Here, the image displayed first in the UI is referred to as a first image, the result of annotation to the first image is referred to as the first annotation result, and the image classification based on the first annotation result is referred to as the first image classification. Further, including the first image classification, the image displayed in the UI after the first image is the second image, the result of re-annotating the second image is the second annotation result, and the image classification based on the second annotation result is further performed. This is called the second image classification. Further, an image including the second image classification and displayed in the UI after the second image is referred to as a third image. That is, the providing unit 11 provides a UI for displaying the first image, the second image, and the third image. The first image is an image displayed before the second image, and may include, for example, an image classification based on the nth annotation result (n is an integer). Further, the second image is an image displayed after the first image, and may include, for example, an image classification based on the n + 1th annotation result. Similarly, the third image is an image displayed after the second image, and may include, for example, an image classification based on the n + second annotation result.

ＵＩは、第１画像を表示することにより第１画像に対するアノテーションを可能にする。ＵＩは、第２画像を表示することにより、第１アノテーション結果に基づく画像分類の予測を表示するとともに、第２画像に対する再度のアノテーションを可能にする。さらに、ＵＩは、第３画像を表示することにより、第２アノテーション結果に基づく画像分類の予測を表示するとともに、第３画像に対する再々度のアノテーションを可能にする。すなわち、利用者は、画像分類の予測結果を確認しながら繰り返してアノテーションを修正することにより、アノテーションの作業をインタラクティブに行うことが可能となり、作業性を向上させてアノテーションの作業負荷を軽減させることができる。 The UI allows annotations on the first image by displaying the first image. By displaying the second image, the UI displays a prediction of image classification based on the result of the first annotation, and enables re-annotation of the second image. Further, the UI displays the third image to display the prediction of the image classification based on the second annotation result, and enables re-annotation of the third image. That is, the user can interactively perform the annotation work by repeatedly modifying the annotation while checking the prediction result of the image classification, improving the workability and reducing the annotation workload. Can be done.

なお、アノテーションの回数に応じて画像分類を区別して表示するようにしてもよい。例えば、第１アノテーション結果に基づく第１画像分類において正例と予測された領域をオレンジ色、負例として予測された領域を青色で表示する。さらに、第２アノテーション結果に基づく第２画像分類において正例と予測された領域を赤色、負例として予測された領域を紺色で表示するようにしてもよい。 Note that the image classification may be displayed separately according to the number of annotations. For example, in the first image classification based on the first annotation result, the region predicted as a positive example is displayed in orange, and the region predicted as a negative example is displayed in blue. Further, in the second image classification based on the second annotation result, the region predicted as a positive example may be displayed in red, and the region predicted as a negative example may be displayed in dark blue.

取得部１２は、提供部１１において提供されたＵＩに表示された画像に対して利用者が領域の分類を指定したアノテーション結果を取得する。例えば、取得部１２は、正例または負例の分類として指定された領域の座標データを取得する。取得部１２において取得されるアノテーション結果は、例えば、ＵＩのペイントブラシにおいて描画されたオブジェクトの先端が示す位置の情報である。また、アノテーション結果は、ペイントブラシにおいて描画されたオブジェクトの長さの情報である。取得部１２は、利用者がＵＩを介してアノテーションの操作をする度にアノテーション結果を取得するようにしてもよい。例えば、利用者がペイントブラシによる描画をする度に、指定された正負の分類と描画された領域の情報を取得する。 The acquisition unit 12 acquires the annotation result in which the user specifies the area classification for the image displayed on the UI provided by the providing unit 11. For example, the acquisition unit 12 acquires the coordinate data of the region designated as the classification of the positive example or the negative example. The annotation result acquired by the acquisition unit 12 is, for example, information on the position indicated by the tip of the object drawn by the paint brush of the UI. The annotation result is information on the length of the object drawn by the paint brush. The acquisition unit 12 may acquire the annotation result each time the user operates the annotation via the UI. For example, every time the user draws with the paint brush, information on the specified positive / negative classification and the drawn area is acquired.

また、取得部１２は、第１アノテーション結果および第２アノテーション結果を取得する。取得部１２は、アノテーションが行われる度にアノテーション結果を取得することにより、予測部１４におけるリアルタイムな予測を可能にしている。 In addition, the acquisition unit 12 acquires the first annotation result and the second annotation result. The acquisition unit 12 acquires the annotation result each time the annotation is performed, thereby enabling real-time prediction in the prediction unit 14.

なお、取得部１２は、利用者がアノテーションを行う度にアノテーション結果を取得するものとして説明したが、例えば、取得部１２は、所定の時間間隔においてアノテーション結果を取得するようにしてもよい。また、取得部１２は、所定の回数アノテーションが実施されたときにアノテーション結果を取得するようにしてもよい。 Although the acquisition unit 12 has been described as acquiring the annotation result each time the user performs annotation, for example, the acquisition unit 12 may acquire the annotation result at a predetermined time interval. Further, the acquisition unit 12 may acquire the annotation result when the annotation is performed a predetermined number of times.

記憶部１３は、取得部１２において取得されたアノテーション結果を記憶する。記憶部１３は、例えば、同じ被処理画像に対する複数のアノテーション結果を時系列で記憶する。また、記憶部１３は、異なる被処理画像に対するアノテーション結果を記憶するものであってもよい。例えば、特定の臓器の画像から病巣を認識する場合においては、複数人の臓器の画像に対する複数のアノテーション結果を記憶するようにしてもよい。また、記憶部１３は、次に説明する予測部１４で予測された画像の分類を記憶するようにしてもよい。 The storage unit 13 stores the annotation result acquired by the acquisition unit 12. The storage unit 13 stores, for example, a plurality of annotation results for the same processed image in chronological order. Further, the storage unit 13 may store the annotation results for different processed images. For example, when recognizing a lesion from an image of a specific organ, a plurality of annotation results for images of a plurality of organs may be stored. Further, the storage unit 13 may store the classification of the image predicted by the prediction unit 14 described below.

予測部１４は、取得部１２において取得されたアノテーション結果を学習して、画像の分類を予測する。予測部１４は、アノテーション結果に基づき、利用者が指定した領域の特徴量を抽出し、抽出した特徴量と類似する領域の画像の分類を予測する。 The prediction unit 14 learns the annotation result acquired by the acquisition unit 12 and predicts the classification of the image. The prediction unit 14 extracts the feature amount of the area designated by the user based on the annotation result, and predicts the classification of the image of the area similar to the extracted feature amount.

ここで、画像の特徴量が類似するとは、例えば、比較する画像において、ピクセルの情報の平均値、ピクセルの情報の分散値、またはエッジ強調を施したピクセルの情報の分散値の少なくともいずれか１つの値が所定の範囲内にあることをいう。画像の特徴量の類似は、例えば、１ピクセルの情報の類似であってもよく、また複数ピクセルの類似であってもよい。本実施形態においては、被処理画像を数ピクセル〜数十ピクセルのブロックで区切り、各ブロックの中のピクセルのＲＧＢ値の統計量を特徴量とする場合を例示する。ピクセルの情報がＲＧＢ値である場合、ピクセルの情報の平均値は主にブロックの色味を表し、ピクセルの情報の分散値は主にブロックに含まれる形状的なパターンの細かさを表すことができる。 Here, the fact that the feature amounts of the images are similar means that, for example, in the images to be compared, at least one of the average value of the pixel information, the dispersion value of the pixel information, and the dispersion value of the pixel information with edge enhancement is used. It means that one value is within a predetermined range. The similarity of the feature amounts of the images may be, for example, the similarity of information of one pixel or the similarity of a plurality of pixels. In the present embodiment, a case where the image to be processed is divided into blocks of several pixels to several tens of pixels and the statistic of the RGB value of the pixels in each block is used as a feature amount is illustrated. When the pixel information is an RGB value, the average value of the pixel information mainly represents the color of the block, and the dispersion value of the pixel information mainly represents the fineness of the geometric pattern contained in the block. can.

予測部１４は、画像の特徴量におけるピクセルの情報の平均値、ピクセルの情報の分散値、およびエッジ強調を施したピクセルの情報の分散値の各値に対して重み付けを行い、重み付けを行った特徴量において類似する領域の画像分類を予測するものであってもよい。例えば、予測部１４は、ピクセルの情報の平均値、ピクセルの情報の分散値、およびエッジ強調を施したピクセルの情報の分散値に対して、それぞれ、５０：３０：２０の重み付けをして、算出された数値を特徴量として画像分類を予測する。 The prediction unit 14 weighted and weighted each value of the average value of the pixel information, the dispersion value of the pixel information, and the dispersion value of the pixel information with edge enhancement in the feature amount of the image. It may predict the image classification of similar regions in the feature amount. For example, the prediction unit 14 weights the average value of the pixel information, the variance value of the pixel information, and the variance value of the pixel information with edge enhancement by 50:30:20, respectively. Image classification is predicted using the calculated numerical value as a feature amount.

予測部１４は、画像の種類に応じて予め設定された重み付けを選択するようにしてもよい。例えば、周囲と色が異なる物体が含まれる画像の種類において画像中の物体を認識する処理においては、予測部１４は、ピクセルの情報の平均値の重みを大きくする設定を選択する。また、周囲と形状的なパターンが異なる物体が含まれる画像の種類において画像中の物体を認識する処理においては、予測部１４は、ピクセルの分散値の重みを大きくする設定を選択するようにしてもよい。また、輪郭が不鮮明な物体が含まれる画像の種類において画像中の物体を認識する処理においては、予測部１４は、エッジ強調を施したピクセルの情報の分散の重みを大きくする設定を選択するようにしてもよい。 The prediction unit 14 may select a preset weighting according to the type of the image. For example, in the process of recognizing an object in an image in an image type including an object having a color different from that of the surroundings, the prediction unit 14 selects a setting for increasing the weight of the average value of pixel information. Further, in the process of recognizing an object in an image in an image type including an object having a different shape pattern from the surroundings, the prediction unit 14 selects a setting for increasing the weight of the dispersion value of pixels. May be good. Further, in the process of recognizing an object in the image in the type of image including an object having an unclear outline, the prediction unit 14 selects a setting for increasing the weight of the dispersion of the information of the pixel with edge enhancement. It may be.

予測部１４は、例えば、アノテーション結果におけるピクセルのＲＧＢ値の統計量（例えば、平均値または分散値）を特徴量とし、サポートベクターマシン（ＳＶＭ）を用いて画像分類を予測する。ＳＶＭは、教師データに基づき、２値の分類を行う技法（エンジン）であり、アノテーション結果を学習してモデルを構築し、新たな画像が２値のいずれに分類されるかを予測する。なお、予測部１４における機械学習の技法はＳＶＭに限定されるものではなく、例えば、ニューラルネットワーク、クラスタリング、またはベイジアンネットワーク等の技法を用いてもよい。 For example, the prediction unit 14 uses a statistic (for example, an average value or a variance value) of RGB values of pixels in an annotation result as a feature quantity, and predicts image classification using a support vector machine (SVM). SVM is a technique (engine) that classifies binary values based on teacher data. It learns annotation results to build a model, and predicts which of the binary values a new image will be classified into. The machine learning technique in the prediction unit 14 is not limited to SVM, and for example, a technique such as a neural network, clustering, or Bayesian network may be used.

予測部１４は、第１アノテーション結果に基づき第１画像分類を予測し、第２アノテーション結果に基づき第２画像分類を予測する。すなわち、予測部１４は、繰り返して実行されたアノテーション結果に対して、その度に画像分類を予測することにより、利用者に対してアノテーションの結果である画像分類の予測をリアルタイムでフィードバック（画像表示）させることが可能となる。 The prediction unit 14 predicts the first image classification based on the first annotation result, and predicts the second image classification based on the second annotation result. That is, the prediction unit 14 predicts the image classification each time for the repeatedly executed annotation result, thereby feeding back the prediction of the image classification, which is the result of the annotation, to the user in real time (image display). ) Is possible.

また、予測部１４は、アノテーションにおけるペイントブラシで描画されたオブジェクトの先端が示す位置のブロック（例えば、数十ピクセルのブロック）における特徴量をアノテーション結果として学習するようにしてもよい。予測部１４は、描画の先端が示す位置のブロックの特徴量と類似した特徴量を有する領域を、アノテーションにおいて指定された分類として予測する。例えば、利用者が正例として学習させた画像の位置にペイントブラシのカーソルを合わせてオブジェクトを描画することにより、利用者の意図する分類を正しく学習させることができる。 Further, the prediction unit 14 may learn the feature amount in the block (for example, a block of several tens of pixels) at the position indicated by the tip of the object drawn by the paint brush in the annotation as the annotation result. The prediction unit 14 predicts a region having a feature amount similar to the feature amount of the block at the position indicated by the tip of the drawing as the classification specified in the annotation. For example, by moving the cursor of the paint brush to the position of the image learned by the user as a positive example and drawing the object, the classification intended by the user can be correctly learned.

また、予測部１４は、アノテーションにおけるペイントブラシで描画されたオブジェクト（ストローク）の長さに応じて、分類を予測する領域の大きさを変化させるようにしてもよい。例えば、描画のストロークが短い場合、予測部１４は、分類を予測する領域の大きさを小さくする。また、描画のストロークが長い場合、予測部１４は、分類を予測する領域の大きさを大きくする。描画のストロークの長さに応じて分類を予測する領域の大きさを変化させることにより、利用者が意図する範囲において分類を予測させることが可能となる。例えば、利用者は描画のストロークを長くすることにより、一度に広い範囲の分類を予測させることができる。これにより、アノテーションの作業効率を向上させて、作業負担を軽減させることが可能となる。また、利用者は描画のストロークを短くすることにより、狭い範囲において正確な分類の予測をさせることができる。これにより、誤って予測された分類を修正するためのアノテーションの修正が少なくなり、アノテーションの作業効率を向上させて、作業負担を軽減させることが可能となる。 Further, the prediction unit 14 may change the size of the region for predicting the classification according to the length of the object (stroke) drawn by the paint brush in the annotation. For example, when the drawing stroke is short, the prediction unit 14 reduces the size of the region for predicting the classification. Further, when the drawing stroke is long, the prediction unit 14 increases the size of the region for predicting the classification. By changing the size of the area for which the classification is predicted according to the length of the drawing stroke, it is possible to predict the classification within the range intended by the user. For example, the user can predict a wide range of classifications at once by lengthening the drawing stroke. This makes it possible to improve the work efficiency of annotation and reduce the work load. In addition, the user can make an accurate prediction of classification in a narrow range by shortening the drawing stroke. As a result, it is possible to reduce the modification of annotations for correcting erroneously predicted classifications, improve the work efficiency of annotations, and reduce the work load.

また、予測部１４は、アノテーションによって指定されていない領域（ペイントブラシで塗られていない部分）の分類を予測するようにしてもよい。アノテーションによって指定された領域は利用者によって分類が確定された領域であり、分類を予測する必要がない。予測部１４は、アノテーションによって指定された領域以外の領域の分類を予測することにより、予測する領域が少なくなり、予測における処理の効率が向上する。 Further, the prediction unit 14 may predict the classification of the area (the part not painted with the paint brush) not specified by the annotation. The area specified by the annotation is an area whose classification has been confirmed by the user, and it is not necessary to predict the classification. By predicting the classification of the area other than the area specified by the annotation, the prediction unit 14 reduces the number of areas to be predicted and improves the processing efficiency in the prediction.

通信制御部１５は、情報処理装置１とサーバ４との通信を制御する。 The communication control unit 15 controls communication between the information processing device 1 and the server 4.

なお、情報処理装置１が有する、上述の各機能部は、情報処理装置１の機能部の一例を示したものであり、情報処理装置１が有する機能を限定したものではない。例えば、情報処理装置１は、上記全ての機能部を有している必要はなく、一部の機能部を有するものであってもよい。また、情報処理装置１は、上記以外の他の機能を有していてもよい。例えば、情報処理装置１は、情報を入力するために入力機能や、装置の稼働状態をＬＥＤランプ等により報知する出力機能を有していてもよい。 The above-mentioned functional units of the information processing device 1 show an example of the functional units of the information processing device 1, and do not limit the functions of the information processing device 1. For example, the information processing device 1 does not have to have all the above-mentioned functional parts, and may have some of the functional parts. Further, the information processing device 1 may have a function other than the above. For example, the information processing device 1 may have an input function for inputting information and an output function for notifying the operating state of the device by an LED lamp or the like.

また、情報処理装置１が有する上記各機能部は、上述の通り、ソフトウェアによって実現されるものとして説明した。しかし、情報処理装置１が有する上記機能部の中で少なくとも１つ以上の機能部は、ハードウェアによって実現されるものであってもよい。 Further, each of the above-mentioned functional units included in the information processing apparatus 1 has been described as being realized by software as described above. However, at least one or more of the functional units included in the information processing device 1 may be realized by hardware.

また、情報処理装置１が有する上記何れかの機能部は、１つの機能部を複数の機能部に分割して実施してもよい。また、情報処理装置１が有する上記何れか２つ以上の機能部を１つの機能部に集約して実施してもよい。すなわち、図１は、情報処理装置１が有する機能を機能ブロックで表現したものであり、例えば、各機能部がそれぞれ別個のプログラムファイル等で構成されていることを示すものではない。 Further, any of the above-mentioned functional units included in the information processing device 1 may be implemented by dividing one functional unit into a plurality of functional units. Further, any two or more of the above-mentioned functional units included in the information processing device 1 may be integrated into one functional unit. That is, FIG. 1 shows the functions of the information processing apparatus 1 represented by functional blocks, and does not show, for example, that each functional unit is composed of a separate program file or the like.

また、情報処理装置１は、１つの筐体によって実現される装置であっても、ネットワーク等を介して接続された複数の装置から実現されるシステムであってもよい。例えば、情報処理装置１は、その機能の一部または全部をクラウドコンピューティングシステムによって提供されるクラウドサービス等、他の仮想的な装置によって実現するものであってもよい。すなわち、情報処理装置１は、上記各機能部のうち、少なくとも１以上の機能部を他の装置において実現するようにしてもよい。また、情報処理装置１は、デスクトップＰＣ等の汎用的なコンピュータであってもよく、機能が限定された専用の装置であってもよい。 Further, the information processing device 1 may be a device realized by one housing or a system realized by a plurality of devices connected via a network or the like. For example, the information processing device 1 may realize a part or all of its functions by another virtual device such as a cloud service provided by a cloud computing system. That is, the information processing device 1 may realize at least one or more of the above-mentioned functional units in another device. Further, the information processing device 1 may be a general-purpose computer such as a desktop PC, or may be a dedicated device having limited functions.

例えば、サーバ４が、上述した情報処理装置１の一部または全部を有するものであってもよい。 For example, the server 4 may have a part or all of the above-mentioned information processing device 1.

次に、図２を用いて、情報処理装置１の動作を説明する。図２は、実施形態における情報処理装置１の動作の一例を示すフローチャートである。なお、ここで説明する動作は、情報処理装置１を主体として実行される場合を説明するが、図１において説明した情報処理装置１が有する各機能において実現されてもよい。 Next, the operation of the information processing device 1 will be described with reference to FIG. FIG. 2 is a flowchart showing an example of the operation of the information processing device 1 in the embodiment. Although the operation described here will be executed mainly by the information processing device 1, it may be realized by each function of the information processing device 1 described with reference to FIG.

図２において、情報処理装置１は、機械学習における処理対象となる画像が選択されたか否かを判断する（ステップＳ１１）。処理対象の画像（元画像）の選択は、例えば、利用者が所定のフォルダから画像ファイルを選択することにより実施される。処理対象となる画像が選択されていないと判断した場合（ステップＳ１１：ＮＯ）、情報処理装置１は、ステップＳ１１の処理を繰り返して、処理対象となる画像が選択されるのを待機する。 In FIG. 2, the information processing apparatus 1 determines whether or not an image to be processed in machine learning has been selected (step S11). The image to be processed (original image) is selected, for example, by the user selecting an image file from a predetermined folder. When it is determined that the image to be processed has not been selected (step S11: NO), the information processing apparatus 1 repeats the process of step S11 and waits for the image to be processed to be selected.

一方、処理対象となる画像が選択されたと判断した場合（ステップＳ１１：ＹＥＳ）、情報処理装置１は、処理対象の画像を表示して、利用者によるアノテーションを可能にする（ステップＳ１２）。利用者によるアノテーションを可能にするとは、例えば、表示された画像に対して、利用者が領域を指定して、指定した領域にタグ付けできるようにすることをいう。 On the other hand, when it is determined that the image to be processed has been selected (step S11: YES), the information processing apparatus 1 displays the image to be processed and enables annotation by the user (step S12). To enable annotation by the user means, for example, to enable the user to specify an area for the displayed image and tag the specified area.

ステップＳ１２の処理を実行した後、情報処理装置１は、アノテーション結果を取得したか否かを判断する（ステップＳ１３）。アノテーション結果を取得していないと判断した場合（ステップＳ１３：ＮＯ）、情報処理装置１は、ステップＳ１３の処理を繰り返して、アノテーション結果を取得するのを待機する。 After executing the process of step S12, the information processing apparatus 1 determines whether or not the annotation result has been acquired (step S13). When it is determined that the annotation result has not been acquired (step S13: NO), the information processing apparatus 1 repeats the process of step S13 and waits for the annotation result to be acquired.

一方、アノテーション結果を取得したと判断した場合（ステップＳ１３：ＹＥＳ）、情報処理装置１は、取得されたアノテーション結果を学習して、画像分類を予測する（ステップＳ１４）。 On the other hand, when it is determined that the annotation result has been acquired (step S13: YES), the information processing apparatus 1 learns the acquired annotation result and predicts the image classification (step S14).

ステップＳ１４の処理を実行した後、情報処理装置１は、機械学習の処理を終了するか否かを判断する（ステップＳ１５）。機械学習の処理を終了するか否かは、例えば、利用者が処理の終了を明示的に指示したか否かで判断することができる。機械学習の処理を終了するか否かは、例えば、画像分類されていない領域が所定の割合以下になったか否かで判断するようにしてもよい。機械学習の処理を終了しないと判断した場合（ステップＳ１５：ＮＯ）、情報処理装置１は、ステップＳ１２の処理に戻り、ステップＳ１４において予測された画像分類を含む画像を処理対象として表示して、利用者によるアノテーションを可能にする。ステップＳ１２において表示される画像がステップＳ１４の処理において予測された画像分類を含む画像である場合、ステップＳ１２において表示される画像は、例えば、元画像に対して予測された画像分類毎に色分けされた色の情報を重畳することで生成される。 After executing the process of step S14, the information processing device 1 determines whether or not to end the machine learning process (step S15). Whether or not to end the machine learning process can be determined, for example, by whether or not the user has explicitly instructed the end of the process. Whether or not to end the machine learning process may be determined, for example, by whether or not the area not classified as an image is equal to or less than a predetermined ratio. When it is determined that the machine learning process is not completed (step S15: NO), the information processing apparatus 1 returns to the process of step S12, displays an image including the image classification predicted in step S14 as a processing target, and displays the image as a processing target. Allows user annotation. When the image displayed in step S12 is an image including the image classification predicted in the process of step S14, the image displayed in step S12 is color-coded for each predicted image classification with respect to the original image, for example. It is generated by superimposing color information.

上述のように、第１画像は第２画像より先に表示される画像である。例えば、ステップＳ１２の処理において第１画像が表示される場合、情報処理装置１は、ステップＳ１４の処理において、第１アノテーション結果に基づき第１画像分類を予測し、ステップＳ１２の処理において第１画像分類を含む第２画像をアノテーション可能に表示し（ステップＳ１５：ＮＯの場合）、ステップＳ１４の処理において、第２アノテーション結果に基づき第２画像分類を予測し、さらにステップＳ１２の処理において第２画像分類を含む第３画像をアノテーション可能に表示する（ステップＳ１５：ＮＯの場合）。第１画像は、例えば、ステップＳ１１の処理において選択されたと判断された元画像、またはステップＳ１４の処理において予測された画像分類を含む画像である。また、第２画像は、ステップＳ１４の処理において予測された第１画像分類を含む画像である。すなわち、情報処理装置１は、ステップＳ１５の処理において処理を終了しない限り、アノテーションの繰り返しを可能として、繰り返されたアノテーションにそれぞれ対応して画像分類の予測を実施することができる。上述した、第１画像の表示、第１アノテーション結果の取得、および第１画像分類の予測は、ステップＳ１１〜ステップＳ１４の処理を実行することにより実施することができ、さらに、第２画像の表示、第２アノテーション結果の取得、および第２画像分類の予測は、ステップＳ１２〜ステップＳ１４の処理を繰り返して実行することにより実施することができる。 As described above, the first image is an image displayed before the second image. For example, when the first image is displayed in the process of step S12, the information processing apparatus 1 predicts the first image classification based on the first annotation result in the process of step S14, and the first image in the process of step S12. The second image including the classification is displayed in an annotable manner (step S15: in the case of NO), the second image classification is predicted based on the second annotation result in the process of step S14, and the second image is further displayed in the process of step S12. The third image including the classification is displayed in an annotable manner (step S15: in the case of NO). The first image is, for example, an original image determined to be selected in the process of step S11, or an image including an image classification predicted in the process of step S14. The second image is an image including the first image classification predicted in the process of step S14. That is, the information processing apparatus 1 can repeat the annotation and predict the image classification corresponding to each of the repeated annotations unless the processing is completed in the process of step S15. The above-mentioned display of the first image, acquisition of the first annotation result, and prediction of the first image classification can be performed by executing the processes of steps S11 to S14, and further, the display of the second image. , The acquisition of the second annotation result, and the prediction of the second image classification can be performed by repeatedly executing the processes of steps S12 to S14.

一方、機械学習の処理を終了すると判断した場合（ステップＳ１５：ＹＥＳ）、情報処理装置１は、フローチャートに示す処理を終了する。 On the other hand, when it is determined that the machine learning process is finished (step S15: YES), the information processing apparatus 1 ends the process shown in the flowchart.

なお、図示したフローチャートは、情報処理装置１において実行される処理の一例を示すものであり、実行される処理を限定するものではない。例えば、図示した処理は、第１画像の表示と第２画像の表示（ステップＳ１２）、第１アノテーション結果の取得と第２アノテーション結果の取得（ステップＳ１３）、および第１画像分類の予測と第２画像分類の予測（ステップＳ１４）を、それぞれ同じ処理（ステップ）において実行する場合を例示した。しかし、第１画像の表示と第２画像の表示、第１アノテーション結果の取得と第２アノテーション結果の取得、および第１画像分類の予測と第２画像分類の予測を、それぞれ別の処理として実行するようにしてもよい。また、処理の終了を処理の途中で実行出来るようにしてもよい。例えば、ステップＳ１１またはステップＳ１３における待機中に処理を中断できるようにしてもよい。 The illustrated flowchart shows an example of the processing executed by the information processing apparatus 1, and does not limit the processing to be executed. For example, the illustrated processes include display of the first image and display of the second image (step S12), acquisition of the first annotation result and acquisition of the second annotation result (step S13), and prediction of the first image classification and the first. An example is shown in which the prediction of the two image classifications (step S14) is executed in the same process (step). However, the display of the first image and the display of the second image, the acquisition of the first annotation result and the acquisition of the second annotation result, and the prediction of the first image classification and the prediction of the second image classification are executed as separate processes. You may try to do it. Further, the end of the process may be executed in the middle of the process. For example, the process may be interrupted during the standby in step S11 or step S13.

次に、図３を用いて、情報処理装置１のハードウェア構成を説明する。図３は、実施形態における情報処理装置１のハードウェア構成の一例を示すブロック図である。 Next, the hardware configuration of the information processing apparatus 1 will be described with reference to FIG. FIG. 3 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1 according to the embodiment.

情報処理装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０３、Ｉ／Ｏ機器１０４、および通信Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）１０５を有する。情報処理装置１は、図１で説明した情報処理プログラムを実行する装置である。 The information processing device 1 includes a CPU (Central Processing Unit) 101, a RAM (Random Access Memory) 102, a ROM (Read Only Memory) 103, an I / O device 104, and a communication I / F (Interface) 105. The information processing device 1 is a device that executes the information processing program described with reference to FIG.

ＣＰＵ１０１は、ＲＡＭ１０２またはＲＯＭ１０３に記憶された情報処理プログラムを実行することにより、利用者端末の制御を行う。情報処理プログラムは、例えば、プログラムを記録した記録媒体、又はネットワークを介したプログラム配信サーバ等から取得されて、ＲＯＭ１０３にインストールされ、ＣＰＵ１０１から読出されて実行される。 The CPU 101 controls the user terminal by executing an information processing program stored in the RAM 102 or the ROM 103. The information processing program is acquired from, for example, a recording medium on which the program is recorded, a program distribution server via a network, or the like, installed in the ROM 103, read from the CPU 101, and executed.

Ｉ／Ｏ機器１０４は、操作入力機能と表示機能（操作表示機能）を有する。Ｉ／Ｏ機器１０４は、例えばタッチパネルである。タッチパネルは、情報処理装置１の利用者に対して指先又はタッチペン等を用いた操作入力を可能にする。本実施形態におけるＩ／Ｏ機器１０４は、操作表示機能を有するタッチパネルを用いる場合を説明するが、Ｉ／Ｏ機器１０４は、表示機能を有する表示装置と操作入力機能を有する操作入力装置とを別個有するものであってもよい。その場合、タッチパネルの表示画面は表示装置の表示画面、タッチパネルの操作は操作入力装置の操作として実施することができる。なお、Ｉ／Ｏ機器１０４は、ヘッドマウント型、メガネ型、腕時計型のディスプレイ等の種々の形態によって実現されてもよい。 The I / O device 104 has an operation input function and a display function (operation display function). The I / O device 104 is, for example, a touch panel. The touch panel enables the user of the information processing device 1 to input operations using a fingertip, a touch pen, or the like. The case where the I / O device 104 in the present embodiment uses a touch panel having an operation display function will be described, but the I / O device 104 separates the display device having the display function and the operation input device having the operation input function. It may have. In that case, the display screen of the touch panel can be performed as the display screen of the display device, and the operation of the touch panel can be performed as the operation of the operation input device. The I / O device 104 may be realized by various forms such as a head mount type, a glasses type, and a wristwatch type display.

通信Ｉ／Ｆ１０５は、通信用のＩ／Ｆである。通信Ｉ／Ｆ１０５は、例えば、無線ＬＡＮ、有線ＬＡＮ、赤外線等の近距離無線通信を実行する。図は通信用のＩ／Ｆとして通信Ｉ／Ｆ１０５のみを図示するが、情報処理装置１は複数の通信方式においてそれぞれの通信用のＩ／Ｆを有するものであってもよい。 The communication I / F 105 is an I / F for communication. The communication I / F 105 executes short-range wireless communication such as wireless LAN, wired LAN, and infrared rays. Although the figure shows only the communication I / F 105 as the communication I / F, the information processing device 1 may have each communication I / F in a plurality of communication methods.

次に、図４〜図６を用いて、指定ツールによるアノテーションについて説明する。図４〜図６は、ホテルの建物の手前の噴水を正例としてアノテーションする場合を示している。噴水と建物はともに白色であって、ＲＧＢ値における差異が小さいので、アノテーションによる教師データが必要となる。 Next, the annotation by the designated tool will be described with reference to FIGS. 4 to 6. FIGS. 4 to 6 show a case where the fountain in front of the hotel building is annotated as a positive example. Since both the fountain and the building are white and the difference in RGB values is small, teacher data by annotation is required.

図４は、第１の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。第１の実施例は、指定ツールに含まれるペイントブラシにおいて、短いストロークのアノテーションが行われた場合の画像分類の予測を示す。 FIG. 4 is a diagram showing a brush stroke (A) and an image classification prediction (B) in the first embodiment. The first embodiment shows the prediction of image classification when short stroke annotation is performed in the paint brush included in the designated tool.

図４（Ａ）において、ストロークｓａ１は、噴水を正例とするアノテーションであって、短いストロールにおいて噴水の画像を指定している。図４（Ｂ）において、画像分類ｓｙ１は、ストロークｓａ１の先端部分を中心として、ストロークｓａ１の長さに応じた大きさ（面積）において、正例（噴水）と判断した画像分類の予測を示している。特徴量の類似度は、例えば、特徴量（特徴ベクトル）のコサイン類似度で計算することができる。ここで、画像分類ｓｙ１は、その一部が負例である建物にはみ出して予測されているが、多くの部分は正しく正例を予測している。したがって、短いストロークにおいてアノテーションを行った場合、画像分類の予測の精度を向上させることができ、アノテーションを修正する手間を少なくすることが可能となる。 In FIG. 4 (A), the stroke sa1 is an annotation using the fountain as a positive example, and specifies an image of the fountain in a short stroll. In FIG. 4B, the image classification sy1 shows the prediction of the image classification judged to be a normal example (fountain) in the size (area) corresponding to the length of the stroke sa1 centering on the tip portion of the stroke sa1. ing. The similarity of the feature amount can be calculated by, for example, the cosine similarity of the feature amount (feature vector). Here, the image classification sy1 is predicted to protrude from the building where a part of it is a negative example, but a large part correctly predicts a positive example. Therefore, when annotation is performed in a short stroke, the accuracy of prediction of image classification can be improved, and the time and effort for modifying the annotation can be reduced.

図５は、第２の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。第２の実施例は、指定ツールに含まれるペイントブラシにおいて、長いストロークのアノテーションが行われた場合の画像分類の予測を示す。 FIG. 5 is a diagram showing a brush stroke (A) and an image classification prediction (B) in the second embodiment. The second embodiment shows the prediction of image classification when long stroke annotation is performed in the paint brush included in the designated tool.

図５（Ａ）において、ストロークｓａ２は、正例のアノテーションであって、ストロークｓａ１に比べて長いストロールにおいて噴水の画像を指定している。図５（Ｂ）において、画像分類ｓｙ２は、ストロークｓａ２の先端部分を中心として、ストロークｓａ２の長さに応じた大きさにおいて、噴水と判断した画像分類の予測を示している。ここで、画像分類ｓｙ２は、多くの部分が建物（負例）にはみ出して予測されているが、噴水（正例）についても大きな面積において正しく噴水を予測している。したがって、長いストロークにおいてアノテーションを行った場合、一度に大きな面積において画像分類を予測することができ、作業効率を向上させることができる。 In FIG. 5 (A), the stroke sa2 is a positive annotation and specifies an image of the fountain in a stroll that is longer than the stroke sa1. In FIG. 5B, the image classification sy2 shows the prediction of the image classification determined to be a fountain in a size corresponding to the length of the stroke sa2 centering on the tip portion of the stroke sa2. Here, in the image classification sy2, many parts are predicted to protrude from the building (negative example), but the fountain (positive example) also correctly predicts the fountain in a large area. Therefore, when annotation is performed in a long stroke, image classification can be predicted in a large area at a time, and work efficiency can be improved.

図６は、第３の実施例における、ブラシストローク（Ａ）、および画像分類の予測（Ｂ）を示す図である。第３の実施例は、負例による画像分類がされた状態において、第２の実施例と同様に指定ツールに含まれるペイントブラシにおいて、長いストロークのアノテーションが行われた場合の画像分類の予測を示す。 FIG. 6 is a diagram showing a brush stroke (A) and an image classification prediction (B) in the third embodiment. In the third embodiment, in the state where the image classification is performed by the negative example, the prediction of the image classification when the long stroke annotation is performed by the paint brush included in the designated tool as in the second embodiment is predicted. show.

図６（Ａ）において、ストロークｓａ３は、正例のアノテーションであって、ストロークｓａ１に比べて長いストロールにおいて噴水の画像を指定している。画像分類ｆｙ１は、負例である建物の壁面へのアノテーションに基づく、負例の画像分類の予測を示している。画像分類ｆｙ１は、図示しないストロークの先端部分を中心として、図示しないストロークの長さに応じた大きさにおいて、負例（噴水以外）と判断された画像分類の予測を示している。 In FIG. 6A, the stroke sa3 is a positive annotation and specifies an image of the fountain in a stroll that is longer than the stroke sa1. The image classification phy1 shows the prediction of the image classification of the negative example based on the annotation on the wall surface of the building which is the negative example. The image classification fy1 shows the prediction of the image classification determined to be a negative example (other than the fountain) in the size corresponding to the length of the stroke (not shown) centering on the tip portion of the stroke (not shown).

図６（Ｂ）において、画像分類ｓｙ３は、負例である画像分類ｆｙ１の特徴量との比較において、類似度が画像分類ｆｙ１との類似度より高い部分を正例として予測した画像分類である。正例の画像分類においては、過去（例えば、直近１回）にアノテーションされた負例のブロックの特徴量との類似度を比較して画像分類を予測する。一方、負例の画像分類においては、過去（例えば、直近１回）にアノテーションされた正例のブロックの特徴量との類似度を比較して画像分類を予測する。例えば、初期のアノテーションは画像分類の予測がされていない状態で利用者が画像の中の正例または負例を自由に指定することにより行われる。一方、既に画像分類の予測がされている場合、正例または負例として正しく予測されている領域と誤って予測されている領域とが混在している。したがって、再度のアノテーションにおいては、誤って予測されている領域を修正するアノテーションが行われる。このため、再度のアノテーションにおける特徴量は、直近１回のアノテーションにおける特徴量と類似している場合が多く、両者を比較することに画像分類の予測精度を向上させることが可能となる。このため、直近１回のアノテーションにより指定されたブロックの特徴量を保存しておき、新たに予測する領域の特徴量と類似度を比較することにより、正確な画像分類の予測が可能となる。 In FIG. 6B, the image classification sy3 is an image classification in which a portion having a similarity higher than the similarity with the image classification fy1 is predicted as a positive example in comparison with the feature amount of the image classification phy1 which is a negative example. .. In the image classification of the positive example, the image classification is predicted by comparing the similarity with the feature amount of the block of the negative example annotated in the past (for example, the latest one). On the other hand, in the image classification of the negative example, the image classification is predicted by comparing the similarity with the feature amount of the block of the positive example annotated in the past (for example, the latest one). For example, the initial annotation is performed by the user freely specifying a positive example or a negative example in the image in a state where the image classification is not predicted. On the other hand, when the image classification has already been predicted, there are a mixture of regions that are correctly predicted as positive or negative examples and regions that are incorrectly predicted. Therefore, in the re-annotation, the annotation that corrects the erroneously predicted area is performed. Therefore, the feature amount in the re-annotation is often similar to the feature amount in the latest one annotation, and it is possible to improve the prediction accuracy of the image classification by comparing the two. Therefore, accurate image classification can be predicted by saving the feature amount of the block specified by the latest one annotation and comparing the feature amount of the newly predicted area with the similarity.

次に、図７〜図１１を用いて、アノテーションの作業の流れについて説明する。図７〜図１１は、図４〜図６で説明した画像と同様に、ホテルの建物の手前の噴水を正例としてアノテーションする場合を示している。 Next, the work flow of annotation will be described with reference to FIGS. 7 to 11. 7 to 11 show a case where the fountain in front of the hotel building is annotated as a positive example, as in the images described with reference to FIGS. 4 to 6.

図７は、第４の実施例における初期のアノテーションを示す図である。第４の実施例においては、初期のアノテーションにおいて、正例および負例のアノテーションをある程度の範囲で実施した後に、利用者の明示的な指示に基づき画像分類の予測を実施する。追加の予測を実施する場合、利用者が新たにアノテーションを行ったか否かを定期的に判断し、アノテーションが行われた（アノテーション結果を取得した）と判断した場合、自動的に画像分類を予測する。 FIG. 7 is a diagram showing initial annotations in the fourth embodiment. In the fourth embodiment, in the initial annotation, the positive and negative annotations are performed to a certain extent, and then the image classification is predicted based on the explicit instruction of the user. When performing additional predictions, it is periodically determined whether or not the user has newly annotated, and if it is determined that annotations have been made (annotation results have been obtained), image classification is automatically predicted. do.

図７において、画像分野ｓｙ４、および画像分野ｓｙ５の２箇所は、初期のアノテーションにおける正例の予測結果である。また、画像分野ｆｙ２、画像分野ｆｙ３、画像分野ｆｙ４、および画像分野ｆｙ５の４箇所は、初期のアノテーションにおける負例の予測結果である。通常の画像認識においては、負例の方が特徴量の多様性が高い場合が多い。このため、第４の実施例においては、負例のアノテーションの量（面積、ブロック数またはサンプル数）を正例のアノテーションの量に比べて多めにすることにより、画像分類の予測の精度を向上させることができる。なお、正例と負例のアノテーションの量の割合は、画像の種類に応じて変更できるようにしてもよい。 In FIG. 7, the image field sy4 and the image field sy5 are the prediction results of the positive examples in the initial annotation. Further, the four locations of the image field fy2, the image field fy3, the image field fy4, and the image field fy5 are the prediction results of negative examples in the initial annotation. In normal image recognition, negative cases often have a higher variety of features. Therefore, in the fourth embodiment, the accuracy of image classification prediction is improved by increasing the amount of annotations in the negative example (area, number of blocks, or number of samples) as compared with the amount of annotations in the positive example. Can be made to. The ratio of the amount of annotations in the positive example and the negative example may be changed according to the type of the image.

図８は、第４の実施例における画像分類の予測を示す図である。図８において、正例として予測された領域（噴水）は、半透明で表示され、負例として予測された領域（噴水以外の部分）は塗りつぶされて表示される。利用者は噴水が塗りつぶされている領域を視認することができるので、間違って予測された領域を容易に確認することができる。 FIG. 8 is a diagram showing a prediction of image classification in the fourth embodiment. In FIG. 8, the region (fountain) predicted as a positive example is displayed in a semi-transparent manner, and the region (part other than the fountain) predicted as a negative example is filled in and displayed. Since the user can visually recognize the area where the fountain is filled, it is possible to easily confirm the area predicted by mistake.

なお、第４の実施例においては、利用者がアノテーションした正例または負例の領域の中で、特徴量を算出するブロックを、例えば２０００ブロックに限定している。これにより、利用者がアノテーションにより指定した全てのブロックについて特徴量を算出する場合に比べて大幅な処理速度の向上を図ることができる。特徴量を算出するブロック数を限定することにより、画像分類の予測精度は若干低下するが、本実施形態においては、インタラクティブなＵＩによって利用者が予測分類の誤りを容易に認識することができるとともに、アノテーションを容易に修正することができるため、予測精度の低下は問題とならない。 In the fourth embodiment, the block for calculating the feature amount is limited to, for example, 2000 blocks in the region of the positive example or the negative example annotated by the user. As a result, it is possible to significantly improve the processing speed as compared with the case where the feature amount is calculated for all the blocks specified by the user by annotation. By limiting the number of blocks for calculating the feature amount, the prediction accuracy of the image classification is slightly lowered, but in the present embodiment, the user can easily recognize the error of the prediction classification by the interactive UI. , Since the annotation can be easily modified, the decrease in prediction accuracy does not matter.

また、本実施形態においては、利用者によって手動でアノテーションされた正例または負例の判断の結果を、機械学習による予測の結果より優先させて、次の画像分類を予測させるようにしてもよい。利用者による手動の判断結果を機械学習による予測の結果より優先させることにより、利用者の判断が優先されて、アノテーションの回数を増やすことにより、予測の誤りは収束していく。このため、予測精度の低下が問題となることはない。一方、処理速度の向上は、利用者が感じる、アノテーションから予測までのレスポンスを向上させて、アノテーションの作業におけるインタラクティブな操作性を向上させることが可能となる。 Further, in the present embodiment, the result of the determination of the positive example or the negative example manually annotated by the user may be prioritized over the result of the prediction by machine learning to predict the next image classification. .. By prioritizing the result of manual judgment by the user over the result of prediction by machine learning, the judgment of the user is prioritized, and by increasing the number of annotations, the prediction error is converged. Therefore, the decrease in prediction accuracy does not become a problem. On the other hand, the improvement of the processing speed makes it possible to improve the response from the annotation to the prediction that the user feels, and to improve the interactive operability in the annotation work.

図９は、第４の実施例における修正アノテーションを示す図である。図９において、利用者は、予測が間違っている噴水下部の領域を正例のストロークｓａ４において再度アノテーションする。負例として予測が間違った部分は正例より特徴量が類似している部分である。負例として予測が間違った部分を正例で再度アノテーションすることにより、特徴量をより正確に学習させることができ、画像分類の予測精度を向上させることができる。 FIG. 9 is a diagram showing modified annotations in the fourth embodiment. In FIG. 9, the user re-annotates the region of the lower part of the fountain with the wrong prediction in the example stroke sa4. As a negative example, the part where the prediction is wrong is the part where the features are more similar than the positive example. By re-annotating the part where the prediction is wrong as a negative example with a positive example, the feature amount can be learned more accurately, and the prediction accuracy of image classification can be improved.

例えば、特徴量が１つのパラメータの大小で表される場合を考える。正例として正しく予測された特徴量がａ１以上であり、負例として正しく予測された特徴量がａ２未満であるとすると、予測対象の特徴量がｎ（ｎ：ａ１≧ｎ＞ａ２）の領域は、ｎがａ１とａ２のいずれに類似しているかによって正例に予測されるか負例に予測されるかが決定される。ここで、ａ３（ａ３：ａ１＞ａ３）を新たな正例としてアノテーションされた場合、特徴量がｎの領域は、ａ３とａ２のいずれに類似しているかによって予測されるかが決定される。｜ａ１−ａ２｜＞｜ａ３−ａ２｜であるため、ａ３を新たな正例としてアノテーションすることにより、負例として誤って予測される領域が減り、さらに正例として正しく予測される領域が増える。これにより、予測精度の向上を図ることができる。 For example, consider the case where the feature quantity is represented by the magnitude of one parameter. Assuming that the feature amount correctly predicted as a positive example is a1 or more and the feature amount correctly predicted as a negative example is less than a2, the region where the feature amount to be predicted is n (n: a1 ≧ n> a2). Is predicted to be positive or negative depending on whether n is similar to a1 or a2. Here, when a3 (a3: a1> a3) is annotated as a new positive example, it is determined whether the region having the feature amount n is predicted depending on whether it is similar to a3 or a2. Since it is | a1-a2 |> | a3-a2 |, by annotating a3 as a new positive example, the area erroneously predicted as a negative example decreases, and the area correctly predicted as a positive example increases. .. As a result, the prediction accuracy can be improved.

情報処理装置１は、利用者による再アノテーションによる修正作業が一定時間以上されなかった場合、新たにアノテーションされた領域の特徴量をアノテーション結果として取得して、新たに取得したアノテーション結果に基づき画像分類の予測を再度実行する。 When the correction work by re-annotation by the user is not performed for a certain period of time or longer, the information processing device 1 acquires the feature amount of the newly annotated area as the annotation result and classifies the image based on the newly acquired annotation result. Execute the prediction of.

図１０は、第４の実施例における再予測された画像分類の予測を示す図である。図１０において、噴水化部の領域は、ストロークｓａ４による再度アノテーションの結果、正例として正しく予測されている。利用者によるアノテーションを繰り返すことにより、誤って予測された領域は徐々に小さくなり、正例の多くの部分が抽出されたことになる。 FIG. 10 is a diagram showing the prediction of the repredicted image classification in the fourth embodiment. In FIG. 10, the region of the fountainized portion is correctly predicted as a positive example as a result of re-annotation by the stroke sa4. By repeating the annotation by the user, the erroneously predicted area gradually becomes smaller, and many parts of the positive example are extracted.

図１１は、第４の実施例における領域の区分を示す図である。図１１において、第１領域〜第６領域は、利用者が正例または負例としてアノテーションした領域か否か、および正例または負例が正しく予測されたか否かの組合せにおいて領域を区分したものである。第１領域は、利用者が正例としてアノテーションして、正しく予測された領域である。第２領域は、利用者が負例としてアノテーションして、正しく予測された領域である。第３領域は、利用者が正例としてアノテーションして、誤って予測された領域である。第４領域は、利用者が負例としてアノテーションして、誤って予測された領域である。第５領域は、利用者のアノテーションなしに、正例が正しく予測された領域である。第６領域は、利用者のアノテーションなしに、負例が正しく予測された領域である。第４の実施例においては、アノテーションと予測を繰り返すことにより、利用者のアノテーションなしに誤って予測された領域は、含まれていない。すなわち、利用者のアノテーションがない領域においては、誤って予測された領域を全てアノテーションによって修正している。 FIG. 11 is a diagram showing the division of regions in the fourth embodiment. In FIG. 11, the first to sixth regions are divided into regions according to a combination of whether or not the region is annotated by the user as a positive example or a negative example, and whether or not the positive or negative example is correctly predicted. Is. The first area is an area correctly predicted by the user annotating as a positive example. The second area is an area correctly predicted by the user annotating as a negative example. The third area is an area that the user annotates as a positive example and is erroneously predicted. The fourth area is an area that the user annotates as a negative example and is erroneously predicted. The fifth area is an area where the correct example is correctly predicted without the annotation of the user. The sixth area is an area in which a negative example is correctly predicted without the annotation of the user. In the fourth embodiment, by repeating the annotation and the prediction, the region erroneously predicted without the user's annotation is not included. That is, in the area where there is no user annotation, all the erroneously predicted areas are corrected by the annotation.

図１１に示すように、利用者のアノテーションを優先させた場合であっても、利用者がアノテーションした領域が誤って予測される場合がある。すなわち、アノテーションによって教師データが増えた場合であっても、予測が誤ってしまう場合がある。しかし、アノテーションと予測を繰り返すことにより、誤って認識される領域は徐々に減っていき収束していく。なお、第４の実施例においては、正例である噴水の画像をほぼ全て抽出するようにして、負例を正例として誤って予測した領域を多く含んでいる。画像認識においては、正例として抽出したい物体を全て抽出することが重要であり、負例を正例として誤って予測することは問題とはならない場合が多い。すなわち、負例を正例として誤って予測する領域が残ることを許容することにより、アノテーションにおける利用者の作業負荷を軽減させることができる。 As shown in FIG. 11, even when the user's annotation is prioritized, the area annotated by the user may be erroneously predicted. That is, even if the teacher data is increased by the annotation, the prediction may be incorrect. However, by repeating annotation and prediction, the area that is erroneously recognized gradually decreases and converges. In the fourth embodiment, almost all images of the fountain, which is a positive example, are extracted, and a large number of regions erroneously predicted as a positive example are included. In image recognition, it is important to extract all the objects to be extracted as positive examples, and it is often not a problem to erroneously predict negative examples as positive examples. That is, it is possible to reduce the workload of the user in annotation by allowing a region to be erroneously predicted to remain as a positive example as a negative example.

次に、図１２を用いて、医療用の画像における実施例を説明する。図１２は、第５の実施例における、元画像（Ａ）、アノテーション結果（Ｂ）、および画像分類の予測（Ｃ）を示す図である。第５の実施例は、眼底写真において血管部分を正例として抽出するものである。 Next, an example in a medical image will be described with reference to FIG. FIG. 12 is a diagram showing an original image (A), an annotation result (B), and a prediction (C) of image classification in the fifth embodiment. In the fifth embodiment, the blood vessel portion is extracted as a positive example in the fundus photograph.

図１２（Ａ）は眼底写真における処理前の元画像である。眼底写真においては、全体的に赤色の色味を有している。医療用の画像においては、全体的に同様の傾向を有し、色による特徴量の差異が小さい場合が多い。図１２（Ｂ）は、元画像に対してアノテーションを行った結果の画像である。図１２（Ｂ）の図示左下部分は、図示する画像の一部を拡大したものである。図１２（Ｂ）において、ストロークｓａ５およびストロークｓａ６は、正例としてアノテーションされた部分である。画像分類ｆｙ６は、血管以外の部分を負例としてアノテーションされた部分である。なお、正例としてのアノテーションはストロークｓａ５およびストロークｓａ６以外に数カ所においてされている。また、負例としてのアノテーションは画像分類ｆｙ６以外に数カ所においてされている。図１２（Ｃ）は、画像分類の予測結果である。図１２（Ｃ）において、血管の大部分は正例として正しく予測されている。すなわち、特徴量の差異の小さい医療用の画像においても、正例のアノテーションと負例のアノテーションを適切に行うことにより、精度の高い予測が可能であることを示している。本実施形態においては、アノテーションと予測結果の確認をインタラクティブに行うことができるので、アノテーションによる予測結果の良否を利用者が直ちに確認することができるため、アノテーション作業のスキルを向上させることができ、効率の良いアノテーションによって作業負担を軽減させることが可能となる。 FIG. 12A is an original image of the fundus photograph before processing. In the fundus photograph, it has a red tint as a whole. Medical images have the same tendency as a whole, and the difference in the amount of features depending on the color is often small. FIG. 12B is an image of the result of annotating the original image. The lower left portion of FIG. 12B is an enlarged view of a part of the illustrated image. In FIG. 12B, stroke sa5 and stroke sa6 are portions annotated as positive examples. The image classification fi6 is a portion annotated with a portion other than the blood vessel as a negative example. Note that annotations as positive examples are made in several places other than stroke sa5 and stroke sa6. In addition, annotations as a negative example are made in several places other than the image classification fi6. FIG. 12C is a prediction result of image classification. In FIG. 12C, most of the blood vessels are correctly predicted as a positive example. That is, it is shown that even in a medical image having a small difference in the feature amount, highly accurate prediction can be made by appropriately performing the annotation of the positive example and the annotation of the negative example. In the present embodiment, since the annotation and the prediction result can be confirmed interactively, the user can immediately confirm the quality of the prediction result by the annotation, so that the skill of the annotation work can be improved. Efficient annotation makes it possible to reduce the workload.

なお、本実施形態で説明した装置を構成する機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、本実施形態の上述した種々の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 It should be noted that the program for realizing the function constituting the apparatus described in the present embodiment is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. Therefore, the above-mentioned various processes of the present embodiment may be performed. The "computer system" referred to here may include hardware such as an OS and peripheral devices. Further, the "computer system" includes a homepage providing environment (or a display environment) if a WWW system is used. The "computer-readable recording medium" includes a flexible disk, a magneto-optical disk, a ROM, a writable non-volatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, and the like. It refers to the storage device of.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組合せで実現するもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Furthermore, the "computer-readable recording medium" is a volatile memory inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line (for example, DRAM (Dynamic)). It also includes those that hold the program for a certain period of time, such as Random Access Memory)). Further, the program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the "transmission medium" for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Further, the above program may be for realizing a part of the above-mentioned functions. Further, it may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with a program already recorded in the computer system.

以上、本発明の実施形態について、図面を参照して説明してきたが、具体的な構成はこの実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲においての種々の変更も含まれる。 Although the embodiments of the present invention have been described above with reference to the drawings, the specific configuration is not limited to this embodiment and includes various modifications within a range not deviating from the gist of the present invention. Is done.

１情報処理装置
１１提供部
１２取得部
１３記憶部
１４予測部
１５通信制御部
２表示装置
３入力装置
４サーバ
９ネットワーク
１０１ＣＰＵ
１０２ＲＡＭ
１０３ＲＯＭ
１０４Ｉ／Ｏ機器
１０５通信Ｉ／Ｆ
ｓａ１、ｓａ２、ｓａ３、ｓａ４、ｓａ５、ｓａ６ストローク（正例）
ｓｙ１、ｓｙ２、ｓｙ３、ｓｙ４、ｓｙ５画像分類（正例）
ｆｙ１、ｆｙ２、ｆｙ３、ｆｙ４、ｆｙ５、ｆｙ６画像分類（負例） 1 Information processing device 11 Providing unit 12 Acquisition unit 13 Storage unit 14 Prediction unit 15 Communication control unit 2 Display device 3 Input device 4 Server 9 Network 101 CPU
102 RAM
103 ROM
104 I / O equipment 105 Communication I / F
sa1, sa2, sa3, sa4, sa5, sa6 strokes (normal example)
sy1, sy2, sy3, sy4, sy5 Image classification (normal example)
fy1, fy2, fy3, fy4, fy5, fy6 Image classification (negative example)

Claims

A provider that displays an image processed in machine learning and provides a user interface (UI) that enables the user to annotate the image.
An acquisition unit that acquires an annotation result in which a user specifies an area for the image displayed on the UI provided by the provision unit, and an acquisition unit.
A prediction unit that learns the annotation results acquired by the acquisition unit and predicts the classification of the image.
It is an information processing device equipped with
The providing unit provides the UI for displaying the first image, and provides the UI.
The acquisition unit acquires the first annotation result in which the user specifies the area for the first image, and obtains the result.
The prediction unit predicts the first image classification based on the result of the first annotation acquired by the acquisition unit.
The providing unit provides the UI for displaying a second image including the first image classification predicted by the prediction unit.
The acquisition unit acquires the second annotation result in which the user specifies the area for the second image, and obtains the result.
The prediction unit further learns the second annotation result acquired in the acquisition unit to predict the second image classification, and predicts the second image classification.
The providing unit may provide the UI to display a third image including the predicted second image classification in the prediction unit,
The acquisition unit acquires the annotation result including the size of the area specified by the user.
The prediction unit is an information processing device that changes the size of a region for predicting image classification according to the size.

The acquisition unit acquires the second annotation result in which the user modifies the classification with respect to the first image classification and specifies the area in the second image.
The second image classification according to claim 1, wherein the prediction unit further learns the second annotation result in which the classification of the first image classification is modified, and predicts the second image classification in which the classification of the first image classification is modified. Information processing device.

The information processing device according to claim 1 or 2 , wherein the acquisition unit acquires an annotation result including the length of a region specified by the user as the size.

The prediction unit extracts the feature amount of the area specified by the user based on the annotation result, and predicts the classification of the image of the area similar to the extracted feature amount, according to any one of claims 1 to 3. The information processing device described.

The prediction unit is based on at least one of the average value of the pixel information, the dispersion value of the pixel information, and the dispersion value of the pixel information with edge enhancement from the area specified by the user. The information processing apparatus according to claim 4 , wherein the information processing apparatus is extracted.

The prediction unit extracts the average value of the information of the pixel, the variance value of the information of the pixels, and the feature amount each on the basis of the weight set of variance information of the edge enhancement was subjected to pixel, claim 4 Or the information processing apparatus according to 5.

It is an information processing method in an information processing device.
A provision step that displays an image processed in machine learning and provides a user interface (UI) that enables the user to annotate the image.
An acquisition step of acquiring an annotation result in which a user specifies an area for the image displayed on the UI provided in the provision step, and an acquisition step.
A prediction step that learns the annotation result acquired in the acquisition step and predicts the classification of the image,
Information processing method including
The step of providing the UI for displaying the first image and
A step of acquiring the first annotation result in which the user specifies an area for the first image, and
The step of predicting the first image classification based on the acquired first annotation result,
The step of providing the UI to display the second image including the predicted first image classification, and
The step of acquiring the second annotation result in which the user specifies the area for the second image, and
The step of further learning the acquired second annotation result and predicting the second image classification,
Providing said UI for displaying a third image including the predicted second image classification,
Steps to get the annotation result including the size of the area specified by the user,
An information processing method including a step of changing the size of an area for predicting image classification according to the size.

On the computer
A providing function that displays an image processed in machine learning and provides a user interface (UI) that enables the user to annotate the image.
An acquisition function for acquiring an annotation result in which a user specifies an area for the image displayed on the UI provided in the provided function, and an acquisition function.
A prediction function that learns the annotation results acquired by the acquisition function and predicts the classification of the image.
It is an information processing program to realize
The provided function provides the UI for displaying the first image, and provides the UI.
The acquisition function acquires the first annotation result in which the user specifies an area for the first image.
The prediction function predicts the first image classification based on the result of the first annotation acquired by the acquisition function.
The provided function provides the UI for displaying a second image including the first image classification predicted by the prediction function.
The acquisition function acquires the second annotation result in which the user specifies the area for the second image.
The prediction function further learns the second annotation result acquired by the acquisition function to predict the second image classification, and predicts the second image classification.
The provided function provides the UI for displaying a third image including the second image classification predicted by the prediction function.
The acquisition function acquires the annotation result including the size of the area specified by the user.
The prediction function is an information processing program that changes the size of an area for predicting image classification according to the size.