JP7628926B2

JP7628926B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7628926B2
Application number: JP2021162675A
Authority: JP
Inventors: 薫樹小林; 洋史近藤; 泰隆長谷川; 裕司鎌田; 俊太郎由井; 秀行伴; 隆秀新家
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-10-01
Filing date: 2021-10-01
Publication date: 2025-02-12
Anticipated expiration: 2041-10-01
Also published as: JP2023053565A

Description

本発明は、経時データを対象とした情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program for time-series data.

機械学習などを用いる情報処理において、学習に用いる多数の学習データから外れ値を除去する前処理が重要と考えられている。これは、学習データに外れ値が含まれている場合、学習結果が意図しないものとなったり、学習効率低下の原因となったりする可能性があるためである。 In information processing using machine learning and other methods, preprocessing to remove outliers from the large amount of training data used for learning is considered important. This is because if the training data contains outliers, the learning results may be unintended or the learning efficiency may decrease.

外れ値除去の方法は、学習データがある一時点の情報を表す横断データの場合と、時間軸に沿った時系列データの場合で異なる。横断データの場合、スミルノフ・グラブス検定やホテリング理論などが知られている。これらの方法は、データの平均や分散をパラメータとしており、時刻毎にデータのパラメータが動的に変化する時系列データに適用することは困難である。 Methods for removing outliers differ depending on whether the training data is cross-sectional data that represents information at a single point in time, or time-series data along a time axis. For cross-sectional data, the Smirnoff-Grubbs test and Hotelling's theory are well known. These methods use the mean and variance of the data as parameters, making it difficult to apply them to time-series data, where the parameters of the data change dynamically from time to time.

時系列データに対する外れ値とは、（１）ある一つの時系列サンプルの中で、異常な値を呈す時刻が存在する場合と、（２）ある一時刻における多数の時系列サンプルの中で、特異なサンプルが存在する場合とがある。 An outlier in time series data can be (1) a time within a single time series sample that exhibits an abnormal value, or (2) an unusual sample within multiple time series samples at a single time.

上記（１）の外れ値に対して、特開２００８－１１７３８１号公報（特許文献１）は、最適なモデル化を行うことが可能な時系列データ解析装置を実現することを目的とした時系列データ解析装置を開示する。この装置はサンプル内の元の値と時系列で平滑化した値の差分をとることでサンプル内の異常値を検出する。 Regarding the outliers in (1) above, Japanese Patent Application Laid-Open No. 2008-117381 (Patent Document 1) discloses a time series data analysis device that aims to realize a time series data analysis device capable of optimal modeling. This device detects abnormal values in a sample by taking the difference between the original value in the sample and the value smoothed in the time series.

上記（２）の外れ値に対して、国際公開第２０１６／１１６９６１号（特許文献２）は、時系列信号が異常を示し始めた時刻をより正確に求めるための情報処理装置を開示する。この装置は、あらかじめ設定した正常データの上限下限に基づいて異常を検知する。 For the outliers in (2) above, WO 2016/116961 (Patent Document 2) discloses an information processing device for more accurately determining the time when a time series signal begins to show an abnormality. This device detects an abnormality based on upper and lower limits of normal data that are set in advance.

特開２００８－１１７３８１号公報JP 2008-117381 A 国際公開第２０１６／１１６９６１号International Publication No. 2016/116961

しかしながら、いずれの外れ値除去方法を用いても、ある一つの時系列データが時間軸に沿って呈した変化を、学習データに含まれるその他のサンプルと比較して相対的に評価することはできない。加えて、時系列データの観測時点数が少ない場合、時系列データの平滑化や、自己回帰モデルなどの統計モデルによる表現が困難となる。 However, no matter which outlier removal method is used, it is not possible to evaluate the changes that a given piece of time series data exhibits along the time axis in comparison with other samples included in the training data. In addition, when the number of observation points in the time series data is small, it becomes difficult to smooth the time series data or express it using a statistical model such as an autoregressive model.

本明細書では、観測時点数の少ない時系列データを経時データと呼ぶこととする。本発明は、機械学習などの学習に用いる多数の経時学習データの中から、他のサンプルと比較した場合の各時刻の特異性とサンプル内変動の特異性の双方を確認する外れ値除去を特徴とする情報処理の実現を目的とする。 In this specification, time-series data with a small number of observation time points is referred to as longitudinal data. The present invention aims to realize information processing characterized by outlier removal from a large amount of longitudinal learning data used for learning such as machine learning, which confirms both the uniqueness of each time point compared to other samples and the uniqueness of intra-sample variation.

本願において開示される発明の一側面となる情報処理装置は、プログラムを実行するプロセッサと、前記プログラムを保持する記憶デバイスと、を有し、前記記憶デバイスは、各々が２以上の時点のデータを含む複数の時系列データを保持し、前記プロセッサは、前記複数の時系列データの各々について、時間に対する変化率を計算し、前記変化率の分布を所定の確率分布と比較し、前記所定の確率分布からの前記変化率の逸脱の程度が特異データ判定のための条件を満たす前記時系列データを特異データと判定することによって、前記複数の時系列データの前記変化率を相対的に評価し、特異データと判定された前記時系列データの数又は割合が所定の条件を満たすように、前記特異データ判定のための条件を変更しながら、特異データと判定された前記時系列データを除いた前記複数の時系列データの前記変化率の分布と前記所定の確率分布とを比較し、前記所定の確率分布からの前記変化率の逸脱の程度が前記特異データ判定のための条件を満たす前記時系列データを特異データと判定する処理を繰り返し実行することを特徴とする。 An information processing apparatus according to one aspect of the invention disclosed in the present application includes a processor that executes a program and a storage device that stores the program, the storage device stores a plurality of time series data, each of which includes data at two or more points in time, the processor calculates a rate of change with respect to time for each of the plurality of time series data, compares a distribution of the rate of change with a predetermined probability distribution, and determines, as peculiar data, the time series data for which a degree of deviation of the rate of change from the predetermined probability distribution satisfies a condition for peculiar data determination , thereby relatively evaluating the rate of change of the plurality of time series data, compares the distribution of the rate of change of the plurality of time series data excluding the time series data determined to be peculiar data with the predetermined probability distribution while changing the condition for peculiar data determination so that the number or a ratio of the time series data determined to be peculiar data satisfies the predetermined condition, and determines, as peculiar data, the time series data for which a degree of deviation of the rate of change from the predetermined probability distribution satisfies the condition for peculiar data determination .

本発明の代表的な実施の形態によれば、機械学習などに用いる経時学習データから、各サンプルの時間軸に沿った変化率に基づいてサンプル同士を相対的に評価し、特異なサンプルを抽出することができる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to a representative embodiment of the present invention, samples can be relatively evaluated based on the rate of change of each sample along the time axis from time-course learning data used in machine learning and the like, and unique samples can be extracted. Problems, configurations, and effects other than those described above will be made clear through the explanation of the following examples.

本発明の実施例に係る情報処理装置のハードウェア構成の一例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of an information processing device according to an embodiment of the present invention. 本発明の実施例に係る情報処理装置の機能的構成の一例を示すブロック図である。1 is a block diagram showing an example of a functional configuration of an information processing device according to an embodiment of the present invention. 本発明の実施例に係る告知情報の記憶内容の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of the stored contents of notification information according to an embodiment of the present invention. 本発明の実施例に係る非連続値データの一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of non-continuous value data according to an embodiment of the present invention. 本発明の実施例に係る連続値データの一例を示す説明図である。FIG. 1 is an explanatory diagram illustrating an example of continuous value data according to an embodiment of the present invention. 本発明の実施例に係る変化率データの一例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an example of change rate data according to an embodiment of the present invention. 本発明の実施例に係る入出力画面の一例を示す説明図である。FIG. 4 is an explanatory diagram illustrating an example of an input/output screen according to an embodiment of the present invention. 本発明の実施例に係る実行結果画面の一例を示す説明図である。FIG. 13 is an explanatory diagram illustrating an example of an execution result screen according to the embodiment of the present invention.

以下、添付図面を用いて本発明にかかる情報処理装置について説明する。本明細書では、生命保険の引受査定における保険金支払リスク予測のための、特異データ除去を目的とした情報処理の例を示す。引受査定では、保険加入申込者が告知した情報（以下、告知情報）に基づき、将来の保険金支払リスクが査定され、保険加入の承認または謝絶が決定される。告知情報は、健康診断の検査結果、問診、既往歴等を含む。 The information processing device according to the present invention will be described below with reference to the attached drawings. In this specification, an example of information processing aimed at removing anomalous data for predicting insurance payment risk in underwriting assessment of life insurance is shown. In the underwriting assessment, the future insurance payment risk is assessed based on information notified by the insurance applicant (hereinafter, notified information), and a decision is made as to whether to approve or reject the insurance application. The notified information includes the results of a health checkup, a medical interview, medical history, etc.

＜情報処理装置のハードウェア構成例＞
図１は、本発明の実施例に係る情報処理装置のハードウェア構成の一例を示すブロック図である。 <Example of hardware configuration of information processing device>
FIG. 1 is a block diagram showing an example of a hardware configuration of an information processing apparatus according to an embodiment of the present invention.

情報処理装置１００は、プロセッサ１０１と、記憶デバイス１０２と、入力デバイス１０３と、出力デバイス１０４と、通信インターフェース（通信ＩＦ）１０５と、を有する。プロセッサ１０１、記憶デバイス１０２、入力デバイス１０３、出力デバイス１０４、および通信ＩＦ１０５は、バス１０６により接続される。プロセッサ１０１は、情報処理装置１００を制御する。記憶デバイス１０２は、プロセッサ１０１の作業エリアとなる。また、記憶デバイス１０２は、各種プログラムおよびデータを記憶する非一時的なまたは一時的な記録媒体である。記憶デバイス１０２としては、たとえば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリがある。入力デバイス１０３は、データを入力する。入力デバイス１０３としては、たとえば、キーボード、マウス、タッチパネル、テンキー、スキャナがある。出力デバイス１０４は、データを出力する。出力デバイス１０４としては、たとえば、ディスプレイ、プリンタがある。通信ＩＦ１０５は、ネットワーク（図示省略）と接続し、データを送受信する。 The information processing device 100 has a processor 101, a storage device 102, an input device 103, an output device 104, and a communication interface (communication IF) 105. The processor 101, the storage device 102, the input device 103, the output device 104, and the communication IF 105 are connected by a bus 106. The processor 101 controls the information processing device 100. The storage device 102 is the working area of the processor 101. The storage device 102 is a non-temporary or temporary recording medium that stores various programs and data. Examples of the storage device 102 include a ROM (Read Only Memory), a RAM (Random Access Memory), a HDD (Hard Disk Drive), and a flash memory. The input device 103 inputs data. The input device 103 may be, for example, a keyboard, a mouse, a touch panel, a numeric keypad, or a scanner. The output device 104 outputs data. The output device 104 may be, for example, a display or a printer. The communication IF 105 connects to a network (not shown) and transmits and receives data.

＜情報処理装置１００の機能的構成例＞
図２は、本発明の実施例に係る情報処理装置１００の機能的構成の一例を示すブロック図である。 <Example of functional configuration of information processing device 100>
FIG. 2 is a block diagram showing an example of a functional configuration of the information processing device 100 according to an embodiment of the present invention.

情報処理装置１００は、入力部２０１と、データクレンジング部２０２と、データ分類部２０３と、経時変化率算出部２０４と、特異サンプル除外部２０５と、出力部２０６と、を有する。データクレンジング部２０２、データ分類部２０３、経時変化率算出部２０４、および特異サンプル除外部２０５は、具体的には、たとえば、図１に示した記憶デバイス１０２に記憶されたプログラムをプロセッサ１０１に実行させることによって実現される。 The information processing device 100 has an input unit 201, a data cleansing unit 202, a data classification unit 203, a time-dependent change rate calculation unit 204, a peculiar sample removal unit 205, and an output unit 206. Specifically, the data cleansing unit 202, the data classification unit 203, the time-dependent change rate calculation unit 204, and the peculiar sample removal unit 205 are realized, for example, by having the processor 101 execute a program stored in the storage device 102 shown in FIG. 1.

また、情報処理装置１００は、告知情報３００と、非連続値データ４００と、連続値データ５００と、変化率データ６００と、を記憶デバイス１０２に記憶する。告知情報３００は、あらかじめ情報処理装置１００に記憶されていてもよく、情報処理装置１００と通信可能な他のコンピュータから取得してもよい。まず、入力部２０１に入力される告知情報３００について詳細に説明する。 The information processing device 100 also stores the notification information 300, the non-continuous value data 400, the continuous value data 500, and the change rate data 600 in the storage device 102. The notification information 300 may be stored in advance in the information processing device 100, or may be acquired from another computer capable of communicating with the information processing device 100. First, the notification information 300 input to the input unit 201 will be described in detail.

図３は、本発明の実施例に係る告知情報３００の記憶内容の一例を示す説明図である。 Figure 3 is an explanatory diagram showing an example of the stored contents of notification information 300 in an embodiment of the present invention.

告知情報３００は、契約希望者が告知した保険契約に必要な情報であり、分析対象データとなる。告知情報３００は、告知基本情報３１０と、健診結果３２０と、問診結果３３０と、既往歴３４０と、を有する。告知基本情報３１０は、契約希望者の告知に関する基本情報である。告知基本情報３１０は、氏名ＩＤ３１１と、生年月日３１２と、年齢３１３と、を含む。 The notification information 300 is information necessary for the insurance contract notified by the contract applicant, and is the data to be analyzed. The notification information 300 has basic notification information 310, medical examination results 320, interview results 330, and medical history 340. The basic notification information 310 is basic information regarding the notification of the contract applicant. The basic notification information 310 includes a name ID 311, a date of birth 312, and an age 313.

氏名ＩＤ３１１は、契約希望者を一意に特定する識別情報である。生年月日３１２は、契約希望者が生まれた年月日である。図３の氏名ＩＤ３１１が「０００１」の契約希望者の３つのエントリは、当該契約希望者の過去３年分の分析対象データを示す。年齢３１３は、契約希望者の生年月日３１２から起算した年単位の経過年数である。後述する例において、氏名ＩＤ３１１が「０００１」の契約希望者の３つのエントリについて、年齢３１３が「４７」を時系列の１年目、「４８」を時系列の２年目、「４９」を時系列の３年目とする。 Name ID 311 is identification information that uniquely identifies the contract applicant. Date of birth 312 is the date on which the contract applicant was born. The three entries of the contract applicant with name ID 311 "0001" in FIG. 3 show the analysis target data for the past three years for that contract applicant. Age 313 is the number of years that have passed since the contract applicant's date of birth 312. In the example described below, for the three entries of the contract applicant with name ID 311 "0001", age 313 is "47" for the first year in the chronological order, "48" for the second year in the chronological order, and "49" for the third year in the chronological order.

健診結果３２０は、契約希望者が受けた健康診断の結果である。健診結果３２０は、体重３２１と、ＢＭＩ（ＢｏｄｙＭａｓｓＩｎｄｅｘ）３２２と、収縮期血圧３２３と、拡張期血圧３２４と、空腹時血糖３２５と、を含む。体重３２１は、契約希望者の体の重さである。ＢＭＩ３２２は、人間の肥満度を表す体格指数であり、体重／（身長^２）で算出される。ＢＭＩ３２２は、その値が小さくなるほど痩せており、大きくなるほど太っていることを示す。 The medical checkup result 320 is the result of a medical checkup that the contract applicant underwent. The medical checkup result 320 includes weight 321, BMI (Body Mass Index) 322, systolic blood pressure 323, diastolic blood pressure 324, and fasting blood glucose 325. Weight 321 is the body weight of the contract applicant. BMI 322 is a body mass index that indicates the degree of obesity of a person, and is calculated by weight/(height ² ). The smaller the BMI 322 value, the thinner the person is, and the larger the BMI value, the fatter the person is.

収縮期血圧３２３は、心臓から大動脈へ血液を送り出す状態において、心臓の収縮で押し出された血液によって大動脈の血管壁にかかる圧力である。拡張期血圧３２４は、心臓へ血液が戻る状態において、心臓の拡張で大動脈から心臓に血液が流入し大動脈の血液量が減少することで低下した大動脈の血管壁にかかる圧力である。空腹時血糖３２５は、空腹の状態で測定された血糖値である。 Systolic blood pressure 323 is the pressure exerted on the aortic vascular wall by blood pushed out by cardiac contraction when blood is being pumped from the heart to the aorta. Diastolic blood pressure 324 is the pressure exerted on the aortic vascular wall when blood is returning to the heart, which is reduced as blood flows from the aorta into the heart due to cardiac expansion, reducing the amount of blood in the aorta. Fasting blood glucose 325 is the blood glucose level measured in a fasting state.

問診結果３３０は、契約希望者が受けた問診の結果である。問診結果３３０は、喫煙習慣３３１と、飲酒習慣３３２と、運動習慣３３３と、を含む。喫煙習慣３３１は、契約希望者の喫煙の有無、頻度および喫煙量である。飲酒習慣３３２は、契約希望者の飲酒の有無、頻度および飲酒量である。運動習慣３３３は、契約希望者の運動の有無、頻度および運動量である。 Medical interview result 330 is the result of the medical interview taken by the contract applicant. Medical interview result 330 includes smoking habits 331, drinking habits 332, and exercise habits 333. Smoking habits 331 are whether or not the contract applicant smokes, how often, and how much they smoke. Drinking habits 332 are whether or not the contract applicant drinks alcohol, how often, and how much they drink. Exercise habits 333 are whether or not the contract applicant exercises, how often, and how much they exercise.

既往歴３４０は、契約希望者が既に受信または入院した履歴である。既往歴３４０は、高血圧症受診歴３４１と、高血圧症入院歴３４２と、糖尿病受診歴３４３と、を含む。高血圧症受診歴３４１は、契約希望者が高血圧症に関して受診した履歴である。高血圧症入院歴３４２は、契約希望者が高血圧症に関して入院した履歴である。糖尿病受診歴３４３は、契約希望者が糖尿病に関して受診した履歴である。 Medical history 340 is a history of medical treatments the contract applicant has already received or been hospitalized. Medical history 340 includes hypertension medical examination history 341, hypertension hospitalization history 342, and diabetes medical examination history 343. Hypertension medical examination history 341 is a history of medical treatments the contract applicant has received for hypertension. Hypertension hospitalization history 342 is a history of hospitalizations the contract applicant has received for hypertension. Diabetes medical examination history 343 is a history of medical treatments the contract applicant has received for diabetes.

再び図２を参照する。データクレンジング部２０２は、入力部２０１から出力された告知情報３００からノイズとなるサンプルを抽出し、ノイズの除去を行う。これによって、特異データ除去の高精度化、および、その後の保険金支払いリスク予測の予測精度向上を図ることができる。ノイズサンプルの種類には、欠損データおよびデータ誤入力などがあり、それぞれサンプルの削除および修正を行うことでクレンジングを実現する。 Referring again to FIG. 2. The data cleansing unit 202 extracts noise samples from the notification information 300 output from the input unit 201 and removes the noise. This makes it possible to improve the accuracy of removing anomalous data and the prediction accuracy of the subsequent insurance payment risk prediction. Types of noise samples include missing data and erroneous data input, and cleansing is achieved by deleting and correcting the respective samples.

データ分類部２０３は、データクレンジング部２０２から出力されたクレンジング済みの告知情報３００に対し、データ型に基づいて連続値データ５００と非連続値データ４００とに分割する。連続値データ５００には、体重３２１およびＢＭＩ３２２などが分類される。また、非連続値データ４００には、喫煙習慣３３１といった二値データ、および、飲酒習慣３３２といったカテゴリデータが分類される。これによって、データ型を踏まえた処理が可能となり、特異データ除去の高精度化を実現する。 The data classification unit 203 divides the cleansed notification information 300 output from the data cleansing unit 202 into continuous value data 500 and non-continuous value data 400 based on the data type. The continuous value data 500 is classified into weight 321, BMI 322, and the like. The non-continuous value data 400 is classified into binary data such as smoking habit 331, and categorical data such as drinking habit 332. This enables processing based on the data type, and achieves high accuracy in removing anomalous data.

図４および図５は、それぞれ、本発明の実施例に係る非連続値データ４００および連続値データ５００の一例を示す説明図である。 Figures 4 and 5 are explanatory diagrams showing examples of non-continuous value data 400 and continuous value data 500, respectively, according to an embodiment of the present invention.

非連続値データ４００は、告知情報３００の部分集合であり、連続値型のデータ以外のデータを含む。例えば図３に示す告知情報３００からは、「なし」「週１回」または「週２回」などのいずれに該当するかを示すカテゴリデータである喫煙習慣３３１、飲酒習慣３３２および運動習慣など、ならびに、「あり」または「なし」の二値データである高血圧症受診歴３４１、高血圧症入院歴および糖尿病受診歴３４３を含む非連続値データ４００が、氏名ＩＤ３１１と対応付けられて、非連続値データ４００が生成される。 The non-continuous value data 400 is a subset of the notification information 300 and includes data other than the continuous value type data. For example, from the notification information 300 shown in FIG. 3, the non-continuous value data 400 including smoking habits 331, drinking habits 332, and exercise habits, which are category data indicating whether the habits correspond to "none," "once a week," or "twice a week," as well as hypertension consultation history 341, hypertension hospitalization history, and diabetes consultation history 343, which are binary data of "yes" or "no," are associated with the name ID 311 to generate the non-continuous value data 400.

連続値データ５００は、告知情報３００の部分集合であり、氏名ＩＤ３１１と対応づけられた、体重３２１、ＢＭＩ３２２、収縮期血圧３２３、拡張期血圧３２４および空腹時血糖３２５などの連続値型のデータを含む。 The continuous value data 500 is a subset of the notification information 300 and includes continuous value type data such as weight 321, BMI 322, systolic blood pressure 323, diastolic blood pressure 324, and fasting blood glucose 325, which are associated with the name ID 311.

連続値データ５００は、本実施例の情報処理装置１００による評価の対象となる時系列データの集合である。例えば、各人物について複数の時点で観測された各項目（例えば体重、ＢＭＩ等）のデータが一つの時系列データである。ただし、本実施例における時系列データの観測時点数は、限定はされないものの、例えば２以上、高々１０程度など、一般的な時系列データで想定されるものと比較して少ないことが想定される。このような時系列データを経時データと記載してもよい。 The continuous value data 500 is a collection of time series data to be evaluated by the information processing device 100 of this embodiment. For example, data for each item (e.g., weight, BMI, etc.) observed at multiple time points for each person constitutes one piece of time series data. However, although the number of observation time points for the time series data in this embodiment is not limited, it is expected to be smaller than that expected for general time series data, for example, 2 or more, or at most about 10. Such time series data may be described as longitudinal data.

再び図２を参照する。経時変化率算出部２０４は、連続値データ５００の各項目に対し、時間経過に応じた変化率を算出する。ここで、連続値データ５００に含まれる時点数をｎとし、各時刻をｔ（１、２、・・・、Ｔ）とする。また、連続値データ５００にＭ種類の連続値データがふくまれていたとすると、例えば、氏名ＩＤ３１１が0001のサンプルの時刻ｔにおける任意の連続値変数はＶ(0001,ｔ,ｍ)と表すことができる。ここで、ｍは、１からＭまでのいずれかの値であり、例えば体重、ＢＭＩといった連続値データの項目を識別する。 Referring again to FIG. 2, the time-dependent change rate calculation unit 204 calculates the rate of change over time for each item of the continuous value data 500. Here, the number of time points included in the continuous value data 500 is n, and each time is t (1, 2, ..., T). Furthermore, if the continuous value data 500 includes M types of continuous value data, for example, an arbitrary continuous value variable at time t for a sample with a name ID 311 of 0001 can be expressed as V(0001, t, m). Here, m is a value between 1 and M, and identifies an item of continuous value data such as weight or BMI.

経時変化率算出部２０４では、時刻ｔ＝２以降の、前時刻からの変化率Ｕを、Ｖ(0001,２,ｍ)からＶ(0001,Ｔ,ｍ)まで算出する。時刻ｔにおける変化率Ｕは次のような数式で表すことが出来る。 The time-dependent change rate calculation unit 204 calculates the rate of change U from the previous time from time t=2 onwards, from V(0001,2,m) to V(0001,T,m). The rate of change U at time t can be expressed by the following formula.

Ｕ（0001，ｔ，ｍ）＝１００×（Ｖ(0001,ｔ,ｍ) －Ｖ(0001,ｔ－１,ｍ)）/Ｖ(0001,ｔ－１,ｍ) U(0001,t,m)=100×(V(0001,t,m) - V(0001,t-1,m))/V(0001,t-1,m)

算出された変化率は、変化率データ６００に格納される。本実施例では、連続値データ５００の時間変化を変化率で表したが、前時刻からの変化量、または、２時刻前からの変化率など、任意の方法で時間変化を表現することで時間変化を定量化することができる。 The calculated rate of change is stored in the rate of change data 600. In this embodiment, the change over time of the continuous value data 500 is expressed as a rate of change, but the change over time can be quantified by expressing the change in time in any manner, such as the amount of change from the previous time, or the rate of change from two times before.

図６は、本発明の実施例に係る変化率データ６００の一例を示す説明図である。 Figure 6 is an explanatory diagram showing an example of change rate data 600 relating to an embodiment of the present invention.

変化率データ６００は、各契約希望者を識別する氏名ＩＤ６０１と、各契約希望者の各時刻の連続値データの変化率と、を含む。図６には、例として、氏名ＩＤ６０１が0001のサンプルおよび0002の、項目ｍの連続値データのサンプルの、時刻２における変化率６０２、時刻Ｔ－１における変化率６０３および時刻Ｔにおける変化率６０４を示す。 The change rate data 600 includes a name ID 601 that identifies each contract applicant, and the change rate of the continuous value data of each contract applicant at each time. As an example, FIG. 6 shows the change rate 602 at time 2, the change rate 603 at time T-1, and the change rate 604 at time T for the samples of continuous value data of item m with name IDs 601 of 0001 and 0002.

特異サンプル除外部２０５は、経時変化率算出部２０４で算出した変化率Ｕを変化率データ６００から呼び出し、各サンプルが他のサンプルと比較して特異な変化を呈するサンプルか否かを判別し、あるサンプルの時間変化が特異である場合そのサンプルを告知情報３００から除外する。 The peculiar sample exclusion unit 205 retrieves the rate of change U calculated by the time-dependent change rate calculation unit 204 from the rate of change data 600, determines whether each sample is a sample that exhibits a peculiar change compared to other samples, and excludes a sample from the notification information 300 if the change over time of the sample is peculiar.

特異サンプル除外部２０５は、全サンプルの変化率Ｕが正規分布に従うと仮定した場合、変化率Ｕの分布と正規分布を比較して正規分布から逸脱したサンプルを抽出し、出力部２０６を通して表示する。変化率Ｕの分布と正規分布の比較には、例えば、ＱＱ（Quantile-Quantile）プロットを用いる。ＱＱプロットとは、変化率Ｕの分位数に対応する正規分布の理論的な分位点をプロットすることにより、変化率Ｕが正規分布に従うかどうかを視覚的に確認する手法として知られている。 If it is assumed that the change rates U of all samples follow a normal distribution, the peculiar sample exclusion unit 205 compares the distribution of the change rates U with the normal distribution, extracts samples that deviate from the normal distribution, and displays them through the output unit 206. To compare the distribution of the change rates U with the normal distribution, for example, a QQ (Quantile-Quantile) plot is used. A QQ plot is known as a method for visually checking whether the change rate U follows a normal distribution by plotting theoretical quantiles of the normal distribution corresponding to the quantiles of the change rate U.

ＱＱプロットにおいて、プロットされた分位点を滑らかにつなげた線が直線となった場合、変化率Ｕは完全な正規分布に従うと判定することが出来る。反対に、プロットされた分位点を直線で近似したとき、直線から逸脱した点は正規分布からの外れ値と判断することができる。特異サンプル除外部２０５は、ＱＱプロットの信頼区間を複数の信頼係数に基づいて算出し、任意の信頼区間から逸脱するサンプルを特異サンプルとして除外する。これによって、その後の保険金支払いリスク予測の予測精度向上を図ることができる。 In a QQ plot, if the line smoothly connecting the plotted quantiles forms a straight line, it can be determined that the rate of change U follows a perfect normal distribution. Conversely, when the plotted quantiles are approximated by a straight line, points that deviate from the straight line can be determined to be outliers from the normal distribution. The peculiar sample exclusion unit 205 calculates the confidence interval of the QQ plot based on multiple confidence coefficients, and excludes samples that deviate from any confidence interval as peculiar samples. This makes it possible to improve the accuracy of future insurance payment risk predictions.

特異サンプル除外部２０５における自動的な特異サンプル抽出方法の一例として、除外サンプル抽出率または抽出数があらかじめ指定した値に達するまで繰り返し処理を行う方法がある。具体的には、特異サンプル除外部２０５は、0.999など信頼係数の十分大きな信頼区間を算出し、信頼区間から逸脱するサンプルを抽出する。このようにして抽出したサンプルの抽出率または抽出数が予め指定した値に達した場合は処理を終了し、そうでない場合信頼係数を小さくし、再度信頼区間から逸脱するサンプルを抽出する。この処理を繰り返し、抽出率または抽出数が予め指定した値に達した時点で処理を終了する。 One example of an automatic method of extracting peculiar samples in the peculiar sample exclusion unit 205 is to repeat the process until the extraction rate or number of samples to be excluded reaches a pre-specified value. Specifically, the peculiar sample exclusion unit 205 calculates a confidence interval with a sufficiently large reliability coefficient, such as 0.999, and extracts samples that deviate from the reliability interval. If the extraction rate or number of samples extracted in this way reaches a pre-specified value, the process ends; if not, the reliability coefficient is reduced and samples that deviate from the reliability interval are extracted again. This process is repeated, and the process ends when the extraction rate or number of samples reaches a pre-specified value.

あるいは、特異サンプル除外部２０５は、正規分布からの逸脱量が最大となるサンプルを除外して、残りのサンプルの変化率Ｕの分布とそれが従う正規分布とを比較して、そこで正規分布からの逸脱量が最大となるサンプルを除外する、という処理を繰り返してもよい。この繰り返しは、除外サンプル抽出率または抽出数があらかじめ指定した値に達するまで行われてもよいし、正規分布からの逸脱量が所定の範囲内に収まるまで（例えば所定の信頼係数の信頼区間から逸脱するサンプルがなくなるまで）行われてもよい。 Alternatively, the peculiar sample exclusion unit 205 may repeat the process of excluding the sample with the maximum deviation from the normal distribution, comparing the distribution of the rate of change U of the remaining samples with the normal distribution that it follows, and eliminating the sample with the maximum deviation from the normal distribution. This repetition may be performed until the extraction rate or number of samples to be excluded reaches a pre-specified value, or until the deviation from the normal distribution falls within a predetermined range (for example, until there are no samples that deviate from the confidence interval of a predetermined confidence coefficient).

特異サンプル除外部２０５は、ＱＱプロットを出力部２０６に表示し、情報処理装置１００のユーザの目視によって特異サンプルを抽出し、除外することもできる。また、本実施例は変化率Ｕが正規分布に従うと仮定したが、これは一例であり、正規分布に限らず、任意の確率分布を仮定してＱＱプロットを描画する方法を採用してもよい。任意の確率分布を予め指定することが困難な場合は、多数の確率分布候補を用意し、繰り返し処理によって除外サンプル抽出率または抽出数が最も小さくなる（すなわち変化率Ｕの逸脱の程度が最も小さい）確率分布を探索する方法を採用してもよい。 The peculiar sample exclusion unit 205 can display the QQ plot on the output unit 206, and the user of the information processing device 100 can visually extract and exclude peculiar samples. In this embodiment, it is assumed that the rate of change U follows a normal distribution, but this is just one example, and a method of drawing a QQ plot by assuming any probability distribution other than the normal distribution may be adopted. If it is difficult to specify an arbitrary probability distribution in advance, a method of preparing a large number of probability distribution candidates and searching for a probability distribution that has the smallest removal sample extraction rate or number of samples extracted by repeated processing (i.e., the smallest degree of deviation of the rate of change U) may be adopted.

さらに、ＱＱプロットではなく、ヒストグラムをゆーあの目視で確認して特異サンプルを抽出し除外する方法も考えられる。特異サンプル除外部２０５で処理する変化率Ｕの集合は、Ｍ種類の項目に対し同一時刻ごとの時間変化で分割して処理することも、時刻を無視し同一時間幅の時間変化でまとめて処理することも可能である。 In addition, instead of using a QQ plot, a method can be considered in which peculiar samples are extracted and removed by visually checking the histogram. The set of change rates U processed by the peculiar sample removal unit 205 can be divided and processed according to the time changes at the same time for M types of items, or it can be processed together according to the time changes over the same time span, ignoring the time.

＜画面例＞
図７は、本発明の実施例に係る入出力画面の一例を示す説明図である。 <Screen example>
FIG. 7 is an explanatory diagram showing an example of an input/output screen according to an embodiment of the present invention.

入出力画面７００は、情報処理装置１００の出力デバイス１０４の一例であるディスプレイ、または、情報処理装置１００と通信可能な他のコンピュータ（図示省略）のディスプレイに表示される。 The input/output screen 700 is displayed on a display, which is an example of the output device 104 of the information processing device 100, or on the display of another computer (not shown) that can communicate with the information processing device 100.

入出力画面７００は、告知情報読込みボタン７０１と、処理項目選択プルダウン７０５と、信頼係数選択バー７０８と処理実行ボタン７１０と、実行結果表示領域８００とを含む。告知情報３００は、ユーザが告知情報読込みボタン７０１を操作したときに読み込まれてもよい。あるいは、ユーザが告知情報入力ボタン７０２を操作したときに、情報処理装置１００が告知情報入力画面（図示省略）を表示し、ユーザが入力デバイス１０３によって告知情報を入力することもできる。 The input/output screen 700 includes a notification information load button 701, a processing item selection pull-down 705, a reliability coefficient selection bar 708, a processing execution button 710, and an execution result display area 800. The notification information 300 may be loaded when the user operates the notification information load button 701. Alternatively, when the user operates the notification information input button 702, the information processing device 100 may display a notification information input screen (not shown), and the user may input the notification information using the input device 103.

図８は、本発明の実施例に係る実行結果画面の一例を示す説明図である。 Figure 8 is an explanatory diagram showing an example of an execution result screen related to an embodiment of the present invention.

実行結果表示領域８００は、特異サンプル除外部２０５で描画したＱＱプロットを表示するプロット画面８０１と、抽出された特異サンプルの氏名ＩＤを表示する特異サンプルＩＤ画面８０２と、を含む。 The execution result display area 800 includes a plot screen 801 that displays the QQ plot drawn by the peculiar sample exclusion unit 205, and a peculiar sample ID screen 802 that displays the name ID of the extracted peculiar sample.

プロット画面８０１は、縦軸に変化率Ｕの分位を、横軸に正規分布の理論的な分位を表し、変化率Ｕをプロットした点を黒丸で表す。変化率Ｕが正確に正規分布に従う場合、黒丸は全て太実線上にプロットされる。細実線は複数の信頼係数に基づいて算出された信頼区間を表す。特異サンプルＩＤ画面８０２は、信頼係数選択バー７０８によって指定された信頼係数に基づいて算出された信頼区間から逸脱するサンプルを列挙する画面である。 The plot screen 801 shows the quantiles of the rate of change U on the vertical axis and the theoretical quantiles of the normal distribution on the horizontal axis, with the points where the rate of change U is plotted being represented by black circles. If the rate of change U exactly follows the normal distribution, all the black circles are plotted on the thick solid line. The thin solid lines represent the confidence intervals calculated based on multiple confidence coefficients. The peculiar sample ID screen 802 is a screen that lists samples that deviate from the confidence intervals calculated based on the confidence coefficient specified by the confidence coefficient selection bar 708.

また、上述した実施例では、生命保険の引受査定における保険金支払リスク予測を例にあげて説明したが、例えば企業の財務分析にも適用可能である。この場合、告知情報３００に替えて有価証券報告書に記載されたデータまたは当該データから算出される指標データが評価の対象となる。一般に、本発明は、多数の時系列データの集合であって、個々の時系列データの観測時点数が少ない場合に、それらの時系列データから特異なサンプル（外れ値）を抽出するために適用することができる。 In addition, while the above-mentioned embodiment has been described using the example of insurance payment risk prediction in life insurance underwriting assessment, the present invention can also be applied to, for example, corporate financial analysis. In this case, the data described in the securities report or index data calculated from said data is used for evaluation instead of the notification information 300. In general, the present invention can be applied to extract peculiar samples (outliers) from a collection of multiple time series data when the number of observation points for each piece of time series data is small.

また、上述した実施例の情報処理装置１００の機能は、ＡＰＩ（Application Programming Interface）を介して提供されてもよい。例えば、情報処理装置１００は、通信ＩＦ１０５に接続されたネットワークを経由して告知情報３００を受信すると、それを記憶デバイス１０２に記憶し、図２に示す処理を実行して、特異サンプルが除外された告知情報と、図７および図８に示す情報を表示するために必要なデータとを、ネットワークを経由して出力してもよい。 Furthermore, the functions of the information processing device 100 in the above-described embodiment may be provided via an API (Application Programming Interface). For example, when the information processing device 100 receives notification information 300 via a network connected to the communication IF 105, it may store it in the storage device 102, execute the process shown in FIG. 2, and output the notification information from which the anomalous sample has been removed and data required to display the information shown in FIG. 7 and FIG. 8 via the network.

また、本発明の実施形態のシステムは次のように構成されてもよい。 The system of the present invention may also be configured as follows:

（１）情報処理装置（例えば情報処理装置１００）であって、プログラムを実行するプロセッサ（例えばプロセッサ１０１）と、プログラムを保持する記憶デバイス（例えば記憶デバイス１０２）と、を有し、記憶デバイスは、各々が２以上の時点のデータを含む複数の時系列データ（例えば連続値データ５００）を保持し、プロセッサは、複数の時系列データの各々について、時間に対する変化率を計算し（例えば経時変化率算出部２０４の処理）、複数の時系列データの変化率を相対的に評価する（例えば特異サンプル除外部２０５の処理）。 (1) An information processing device (e.g., information processing device 100) having a processor (e.g., processor 101) that executes a program and a storage device (e.g., storage device 102) that stores the program, the storage device stores multiple time series data (e.g., continuous value data 500) each including data from two or more points in time, and the processor calculates the rate of change over time for each of the multiple time series data (e.g., processing by the time-varying change rate calculation unit 204) and relatively evaluates the rates of change of the multiple time series data (e.g., processing by the unusual sample exclusion unit 205).

これによって、機械学習などに用いる経時学習データから、各サンプルの時間軸に沿った変化率に基づいてサンプル同士を相対的に評価することができる。 This makes it possible to evaluate samples relative to each other based on the rate of change of each sample over time, using longitudinal learning data used in machine learning and other applications.

（２）上記（１）において、プロセッサは、変化率の分布を所定の確率分布と比較し、所定の確率分布からの変化率の逸脱の程度が特異データ判定のための条件を満たす時系列データを特異データと判定することによって、複数の時系列データの変化率を相対的に評価する（例えば特異サンプル除外部２０５の処理）。 (2) In the above (1), the processor compares the distribution of the change rates with a predetermined probability distribution, and determines that time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies the condition for determining peculiar data is peculiar data, thereby relatively evaluating the change rates of multiple time series data (e.g., processing by the peculiar sample exclusion unit 205).

これによって、機械学習などに用いる経時学習データから、各サンプルの時間軸に沿った変化率に基づいてサンプル同士を相対的に評価し、特異なサンプルを抽出することができる。 This makes it possible to extract peculiar samples from longitudinal learning data used in machine learning and other applications by evaluating samples relative to each other based on the rate of change of each sample over time.

（３）上記（２）において、プロセッサは、所定の確率分布からの変化率の逸脱の程度が最も大きい時系列データを特異データと判定する。 (3) In the above (2), the processor determines that the time series data having the highest deviation in rate of change from a predetermined probability distribution is anomalous data.

これによって、機械学習などに用いる経時学習データから、特異なサンプルを適切に除外することができる。 This makes it possible to properly remove anomalous samples from longitudinal training data used in machine learning and other applications.

（４）上記（３）において、プロセッサは、特異データと判定された時系列データを除いた複数の時系列データの変化率の分布と所定の確率分布とを比較し、所定の確率分布からの変化率の逸脱の程度が最も大きい時系列データを特異データと判定する処理を、特異データと判定された時系列データの数又は割合が所定の条件を満たすか、又は、所定の確率分布からの変化率の逸脱の程度が所定の範囲内となるまで繰り返し実行する。 (4) In the above (3), the processor compares the distribution of the rates of change of multiple time series data excluding the time series data determined to be peculiar data with a predetermined probability distribution, and determines the time series data whose rate of change deviates most from the predetermined probability distribution as peculiar data. This process is repeated until the number or proportion of time series data determined to be peculiar data satisfies a predetermined condition or the degree of deviation of the rate of change from the predetermined probability distribution falls within a predetermined range.

（５）上記（２）において、プロセッサは、特異データと判定された時系列データの数又は割合が所定の条件を満たすように、特異データ判定のための条件を変更しながら、特異データと判定された時系列データを除いた複数の時系列データの変化率の分布と所定の確率分布とを比較し、所定の確率分布からの変化率の逸脱の程度が特異データ判定のための条件を満たす時系列データを特異データと判定する処理を繰り返し実行する。 (5) In the above (2), the processor repeatedly executes a process of comparing the distribution of the change rates of the multiple time series data excluding the time series data determined to be peculiar data with a predetermined probability distribution while changing the conditions for determining peculiar data so that the number or proportion of time series data determined to be peculiar data satisfies the predetermined condition, and determining as peculiar data the time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies the condition for determining peculiar data.

（６）上記（２）において、プロセッサは、変化率の分布がそれぞれ異なる確率分布モデルに基づくと仮定することによって複数の確率分布を生成し、変化率の分布を、複数の確率分布と比較し、変化率の逸脱の程度が最も小さい確率分布を、特異データ判定のための比較の対象である所定の確率分布として決定する。 (6) In the above (2), the processor generates multiple probability distributions by assuming that the distributions of the change rates are based on different probability distribution models, compares the distribution of the change rates with the multiple probability distributions, and determines the probability distribution with the smallest degree of deviation of the change rates as the predetermined probability distribution to be compared for determining peculiar data.

（７）上記（２）において、所定の確率分布は、変化率の分布が従うと仮定する所定の確率分布モデルに基づく分布（例えば正規分布及びその他の分布）であり、 (7) In (2) above, the predetermined probability distribution is a distribution based on a predetermined probability distribution model that the distribution of the rate of change is assumed to follow (e.g., normal distribution and other distributions),

プロセッサは、変化率が所定の確率分布の所定の信頼係数に対応する信頼区間から外れる時系列データを特異データと判定する。 The processor determines that time series data whose rate of change falls outside a confidence interval corresponding to a predetermined confidence coefficient of a predetermined probability distribution is anomalous data.

（８）上記（７）において、プロセッサは、特異データと判定された時系列データの数又は割合が所定の条件を満たすように、信頼係数を変更しながら、特異データと判定された時系列データを除いた複数の時系列データの変化率の分布と所定の確率分布とを比較し、変化率が信頼区間から外れる時系列データを特異データと判定する処理を繰り返し実行する。 (8) In the above (7), the processor repeatedly performs a process of comparing the distribution of the change rates of the multiple time series data excluding the time series data determined to be peculiar data with a predetermined probability distribution while changing the reliability coefficient so that the number or proportion of time series data determined to be peculiar data satisfies a predetermined condition, and determining that the time series data whose change rate falls outside the reliability interval is peculiar data.

（９）上記（２）において、情報処理装置は出力デバイスをさらに有し、出力デバイスは、変化率の分布と所定の確率分布とに基づくQuantile-Quantileプロットを表示する。 (9) In the above (2), the information processing device further has an output device, and the output device displays a quantile-quantile plot based on the distribution of the change rate and a predetermined probability distribution.

これによって、特異データの存在を視覚的に確認しやすくなる。 This makes it easier to visually check for the presence of anomalous data.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 The present invention is not limited to the above-described embodiments, but includes various modified examples and equivalent configurations within the spirit of the appended claims. For example, the above-described embodiments have been described in detail to clearly explain the present invention, and the present invention is not necessarily limited to having all of the configurations described. Furthermore, a portion of the configuration of one embodiment may be replaced with the configuration of another embodiment. Furthermore, the configuration of another embodiment may be added to the configuration of one embodiment. Furthermore, other configurations may be added, deleted, or replaced with part of the configuration of each embodiment.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサ１０１がそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 Furthermore, each of the configurations, functions, processing units, processing means, etc. described above may be realized in part or in whole in hardware, for example by designing them as integrated circuits, or may be realized in software by having the processor 101 interpret and execute a program that realizes each function.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード、ＳＤカード、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）の計算機読み取り可能な非一時的データ記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, hard disk, or SSD (Solid State Drive), or in a computer-readable non-transitory data recording medium such as an IC (Integrated Circuit) card, SD card, or DVD (Digital Versatile Disc).

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 In addition, the control lines and information lines shown are those considered necessary for explanation, and do not necessarily represent all control lines and information lines necessary for implementation. In reality, it is safe to assume that almost all components are interconnected.

１００情報処理装置
１０１プロセッサ
１０２記憶デバイス
２０２データクレンジング部
２０３データ分類部
２０４経時変化率算出部
２０５特異サンプル除外部
３００告知情報
４００非連続値データ
５００連続値データ
６００変化率データ 100 Information processing device 101 Processor 102 Storage device 202 Data cleansing unit 203 Data classification unit 204 Time-dependent change rate calculation unit 205 Unique sample removal unit 300 Notification information 400 Non-continuous value data 500 Continuous value data 600 Change rate data

Claims

An information processing device,
A processor for executing a program and a storage device for storing the program,
the storage device holds a plurality of time series data each including data at two or more points in time;
The processor,
Calculating a rate of change over time for each of the plurality of time series data;
comparing the distribution of rates of change to a predetermined probability distribution;
Relatively evaluating the change rates of the plurality of time series data by determining, as peculiar data, that the degree of deviation of the change rate from the predetermined probability distribution satisfies a condition for peculiar data determination ;
an information processing apparatus which repeatedly executes a process of comparing a distribution of the change rates of the plurality of time series data excluding the time series data determined to be peculiar data with the predetermined probability distribution while changing a condition for the peculiar data determination so that the number or a ratio of the time series data determined to be peculiar data satisfies a predetermined condition, and determining, as peculiar data, the time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies the condition for the peculiar data determination .

2. The information processing device according to claim 1,
The processor,
generating a plurality of probability distributions by assuming that the distributions of the rates of change are based on different probability distribution models;
comparing the distribution of the rate of change to the plurality of probability distributions;
determining, as the predetermined probability distribution to be compared for determining whether data is anomalous, the probability distribution having the smallest degree of deviation from the rate of change.

An information processing device,
A processor for executing a program and a storage device for storing the program,
the storage device holds a plurality of time series data each including data at two or more points in time;
The processor,
Calculating a rate of change over time for each of the plurality of time series data;
comparing the distribution of rates of change to a predetermined probability distribution;
Relatively evaluating the change rates of the plurality of time series data by determining, as peculiar data, that the degree of deviation of the change rate from the predetermined probability distribution satisfies a condition for peculiar data determination;
the predetermined probability distribution is a distribution based on a predetermined probability distribution model which the distribution of the change rate is assumed to follow;
The processor determines, as anomalous data, the time series data in which the rate of change falls outside a confidence interval corresponding to a predetermined confidence coefficient of the predetermined probability distribution;
an information processing apparatus which repeatedly executes a process of comparing a distribution of the change rates of the plurality of time series data excluding the time series data determined to be peculiar data with the predetermined probability distribution while changing the reliability coefficient so that the number or ratio of the time series data determined to be peculiar data satisfies a predetermined condition, and determining that the time series data whose change rate falls outside the reliability interval is peculiar data.

2. The information processing device according to claim 1,
further comprising an output device;
The information processing apparatus according to claim 1, wherein the output device displays a quantile-quantile plot based on the distribution of the change rates and the predetermined probability distribution.

An information processing method executed by an information processing device,
The information processing device includes a processor that executes a program and a storage device that stores the program;
the storage device holds a plurality of time series data each including data at two or more points in time;
The information processing method includes:
A step of the processor calculating a rate of change with respect to time for each of the plurality of time series data;
the processor comparing the distribution of the rate of change to a predetermined probability distribution;
a step of relatively evaluating the change rates of the plurality of time series data by determining, by the processor, that the time series data is peculiar data when the degree of deviation of the change rate from the predetermined probability distribution satisfies a condition for peculiar data determination, and
an information processing method comprising: a step of relatively evaluating the change rates of the plurality of time series data; a step of repeatedly executing a process of comparing a distribution of the change rates of the plurality of time series data excluding the time series data determined to be peculiar data with the predetermined probability distribution while changing a condition for the peculiar data determination so that a number or a ratio of the time series data determined to be peculiar data satisfies a predetermined condition; and determining, as peculiar data, the time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies the condition for the peculiar data determination.

An information processing program for controlling an information processing device,
the information processing device includes a processor that executes the information processing program and a storage device that stores the information processing program;
the storage device holds a plurality of time series data each including data at two or more points in time;
The information processing program includes:
calculating a rate of change over time for each of the plurality of time series data;
comparing said distribution of rates of change to a predetermined probability distribution;
a step of relatively evaluating the change rates of the plurality of time series data by determining, as peculiar data, the time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies a condition for peculiar data determination;
the step of relatively evaluating the change rates of the plurality of time series data, the information processing program repeatedly executes a process of comparing a distribution of the change rates of the plurality of time series data excluding the time series data determined to be peculiar data with the predetermined probability distribution while changing a condition for the peculiar data determination so that a number or a ratio of the time series data determined to be peculiar data satisfies a predetermined condition, and determining, as peculiar data, the time series data whose degree of deviation of the change rate from the predetermined probability distribution satisfies the condition for the peculiar data determination.