JP5068864B2

JP5068864B2 - Logistic regression analysis system and logistic regression analysis program

Info

Publication number: JP5068864B2
Application number: JP2011034012A
Authority: JP
Inventors: 薫大田; 美帆子野間
Original assignee: SCSK Corp
Current assignee: SCSK Corp
Priority date: 2011-02-18
Filing date: 2011-02-18
Publication date: 2012-11-07
Anticipated expiration: 2031-02-18
Also published as: JP2012173899A

Description

本発明は、ロジスティック回帰分析システム及びロジスティック回帰分析プログラムに関する。 The present invention relates to a logistic regression analysis system and a logistic regression analysis program.

各種の店舗等においてクレジットカード（以下、カード）による決済が行われる際には、店舗等からカード会社に対して、カード番号等の情報を送信することで与信照会が要求される。この与信照会の要求を受けたカード会社は、カードの与信残高照会に加えて、カードの利用が不正利用である可能性についての判定を行っており、この判定には、スコアリングモデルと呼ばれる分析モデルを使用している。このスコアリングモデルとしては様々なモデルが提案されており、その中の一つに、ロジスティック回帰分析を適用したモデルがある（例えば、特許文献１参照）。 When payment is made with a credit card (hereinafter referred to as a card) at various stores or the like, a credit inquiry is requested by transmitting information such as a card number from the store or the like to the card company. In response to the credit inquiry request, the credit card company makes a judgment on the possibility of unauthorized use of the card, in addition to the credit balance inquiry of the card. You are using a model. Various models have been proposed as the scoring model, and one of them is a model to which logistic regression analysis is applied (for example, see Patent Document 1).

このようなロジスティック回帰分析を適用したスコアリングモデルは、カードの不正利用が発生すると予想される確率（以下、予想不正率）に対応するスコアを算定するためのモデルであり、この予想不正率に影響を与え得る説明変数を複数のデータ項目に基づいて算定し、当該算定した複数の説明変数を所定のロジスティック回帰関数に投入することによりスコアを算定する。 The scoring model to which such logistic regression analysis is applied is a model for calculating the score corresponding to the probability that card misuse is expected to occur (hereinafter, the expected fraud rate). An explanatory variable that can be influenced is calculated based on a plurality of data items, and the score is calculated by inputting the calculated plurality of explanatory variables into a predetermined logistic regression function.

特開２００７−２０７０１１号公報JP 2007-207011 A

しかしながら、ロジスティック回帰分析に投入することができる説明変数には、一定の制約が存在することが知られていた。例えば、特定のデータ項目におけるデータカテゴリの構成比が小さい場合、このようなデータカテゴリに対応する説明変数の変数カテゴリについては、いわゆる完全分離や準完全分離の問題が発生するため、ロジスティック回帰分析で扱うことができず、このような変数カテゴリをそのままスコアリングモデルに投入することができなかった。 However, it has been known that there are certain constraints on the explanatory variables that can be input to the logistic regression analysis. For example, when the composition ratio of the data category in a specific data item is small, the so-called complete separation and semi-complete separation problems occur for variable categories of explanatory variables corresponding to such data categories. Such variable categories could not be put into the scoring model as they were.

例えば、換金性の高い商品がカードで購入された場合には、換金性の低い商品が購入された場合に比べて、カードの不正利用が行われる可能性が高いことが判っている。従って、換金性の高い商品については、商品コード（各商品の種類を一意に識別するために予め設定されているコードであり、例えば、貴金属、商品券、プリペイドカード等をそれぞれ区別するためのコード）単位でデータカテゴリを細分化して分析を行い、スコアリングモデルの精度を高めることができると考えられる。しかしながら、このような細分化を行った場合には、データ項目（この例では、換金性の高い商品を含む全ての商品の商品コード）における各データカテゴリ（この例では、貴金属の商品コード、商品券の商品コード、プリペイドカードの商品コード等）の構成比が小さくなり、このような構成比の小さい各データカテゴリに対応する説明変数の変数カテゴリをスコアリングモデルにそのまま投入することができないため、結局は、細分化した分析を行うことはできず、スコアリングモデルの精度を高めることができなかった。 For example, it is known that when a highly convertible product is purchased with a card, there is a higher possibility of fraudulent use of the card than when a less convertible product is purchased. Therefore, for highly convertible products, product codes (codes set in advance to uniquely identify the type of each product, for example, codes for distinguishing precious metals, gift certificates, prepaid cards, etc., respectively) ) It is thought that the accuracy of the scoring model can be improved by subdividing the data category in units of analysis. However, when such subdivision is performed, each data category (in this example, product code of precious metals, product in the data item (product code of all products including highly cashable products) in this example) The product ratio of the ticket, the product code of the prepaid card, etc.) will be small, and the variable categories of the explanatory variables corresponding to each data category with such a small composition ratio cannot be input as it is to the scoring model. Eventually, it was not possible to perform a detailed analysis and to improve the accuracy of the scoring model.

本発明は、上記に鑑みてなされたものであって、データ項目における構成比が小さいデータカテゴリに対応する説明変数の変数カテゴリを用いてロジスティック回帰分析を行うことが可能になる、ロジスティック回帰分析システム及びロジスティック回帰分析プログラムを提供することを目的とする。 The present invention has been made in view of the above, and a logistic regression analysis system capable of performing logistic regression analysis using variable categories of explanatory variables corresponding to data categories having a small composition ratio in data items. And it aims at providing a logistic regression analysis program.

請求項１に記載のロジスティック回帰分析システムは、所定の事象が発生する確率に影響を与え得る複数の説明変数の変数カテゴリであって、複数のデータ項目の各々に含まれるデータカテゴリに対応する変数カテゴリを、所定のロジスティック回帰関数に投入することにより前記確率を算定する、ロジスティック回帰分析システムであって、前記変数カテゴリに対応する前記データ項目の前記データカテゴリの構成比を格納する構成比格納手段と、前記データカテゴリを対象として、前記事象が過去に発生した確率を算定するために必要な情報を格納する事象発生確率格納手段と、前記複数の説明変数の全体に対する当該複数の説明変数の各々の有効性を示す数値を格納する有効性格納手段と、前記説明変数の変数カテゴリを前記ロジスティック回帰関数に投入することができるか否かを判定する判定手段であって、前記ロジスティック回帰関数に投入する説明変数を算定するための所定の複数のデータ項目が所定方法で取得された場合に、当該データ項目の前記データカテゴリの構成比を前記構成比格納手段から取得し、当該取得した構成比が所定値未満であるか否かを判定し、当該取得した構成比が所定値未満である場合には、当該データカテゴリの前記データ項目に対応する当該変数カテゴリを前記ロジスティック回帰関数に投入することができない変数カテゴリであると判定する判定手段と、前記判定手段にて投入することができないと判定された前記変数カテゴリに対応する前記データ項目の前記データカテゴリに対応する情報を前記事象発生確率格納手段から取得し、当該取得した情報に基づいて、前記判定手段にて投入することができないと判定された前記変数カテゴリに対応する前記データ項目の前記データカテゴリを対象として、前記事象が過去に発生した確率を算定する確率算定手段と、前記確率算定手段にて算定された確率を所定倍数で整数化する整数化手段と、前記判定手段にて投入することができないと判定された前記変数カテゴリの説明変数に対応する有効性を示す数値を前記有効性格納手段から取得し、当該取得した有効性を示す数値に基づいて前記整数化手段にて整数化された確率を展開することにより、当該確率の算定の対象となったデータカテゴリに対応する説明変数の変数カテゴリの変換値を生成する展開手段とを備える。 The logistic regression analysis system according to claim 1 is a variable category of a plurality of explanatory variables capable of affecting a probability of occurrence of a predetermined event, the variable corresponding to a data category included in each of the plurality of data items. A logistic regression analysis system for calculating the probability by inputting a category into a predetermined logistic regression function, wherein the composition ratio storage means stores the composition ratio of the data category of the data item corresponding to the variable category And event occurrence probability storage means for storing information necessary for calculating the probability that the event occurred in the past for the data category, and the plurality of explanatory variables for the plurality of explanatory variables as a whole. and validity storage means for storing a numerical value indicating the validity of each variable category of the explanatory variables the Rojisute A determining unit that determines whether it is possible to put into click regression function, when a predetermined plurality of data items to calculate the explanatory variables to be introduced into the logistic regression function is obtained by a predetermined method In addition, the composition ratio of the data category of the data item is acquired from the composition ratio storage means, it is determined whether the acquired composition ratio is less than a predetermined value, and the acquired composition ratio is less than the predetermined value. in some cases, the determining means and the variable category corresponding to the data item is a variable category that can not be put into the logistic regression function of the data categories, can not be turned by said determining means Information corresponding to the data category of the data item corresponding to the variable category determined to be acquired from the event occurrence probability storage means. Based on the acquired information, as an object the data category of the data items corresponding to the variable category is determined not to be able to put in the determination means, calculating the probability that the event has occurred in the past Corresponding to the explanatory variable of the variable category determined to be unable to be input by the determination means , the integer calculation means for converting the probability calculated by the probability calculation means into an integer by a predetermined multiple, and the determination means A numerical value indicating the effectiveness to be obtained from the validity storage means, and by expanding the probability converted into an integer by the integerization means based on the acquired numerical value indicating the validity, the target of calculation of the probability Expansion means for generating a converted value of the variable category of the explanatory variable corresponding to the data category.

請求項２に記載のロジスティック回帰分析システムは、請求項１に記載のロジスティック回帰分析システムにおいて、前記構成比格納手段は、前記データ項目に含まれる最小単位のデータカテゴリの構成比を格納し、前記判定手段は、前記データ項目に含まれる最小単位のデータカテゴリの構成比を前記構成比格納手段から取得し、当該取得した構成比が所定値未満であるか否かを判定し、当該取得した構成比が所定値未満である場合には、当該データカテゴリの前記データ項目に対応する当該変数カテゴリを前記ロジスティック回帰関数に投入することができない変数カテゴリであると判定する。 The logistic regression analysis system according to claim 2 is the logistic regression analysis system according to claim 1, wherein the component ratio storage means stores a component ratio of a data category of a minimum unit included in the data item, and The determination means acquires the composition ratio of the data category of the minimum unit included in the data item from the composition ratio storage means, determines whether or not the acquired composition ratio is less than a predetermined value, and the acquired configuration When the ratio is less than a predetermined value, it is determined that the variable category corresponding to the data item of the data category cannot be input to the logistic regression function .

請求項３に記載のロジスティック回帰分析システムは、請求項１又は２に記載のロジスティック回帰分析システムにおいて、前記確率算定手段は、前記事象が過去に発生した確率を百分率値として算定し、前記整数化手段は、前記確率算定手段にて算定された確率を百倍以上の所定倍数で整数化する。 The logistic regression analysis system according to claim 3 is the logistic regression analysis system according to claim 1 or 2, wherein the probability calculation means calculates a probability that the event has occurred in the past as a percentage value, and the integer The converting means converts the probability calculated by the probability calculating means into an integer with a predetermined multiple of 100 times or more.

請求項４に記載のロジスティック回帰分析システムは、請求項１から３のいずれか一項に記載のロジスティック回帰分析システムにおいて、クレジットカードの不正利用が発生する確率に対応するスコアを、前記ロジスティック回帰関数を適用したスコアリングモデルにて算定するシステムであり、前記展開手段にて生成された説明変数の変数カテゴリの変換値を前記スコアリングモデルに投入することにより、前記スコアを算定するスコア算定手段を備え、前記事象発生確率格納手段は、前記データカテゴリと、前記クレジットカードが当該データカテゴリに対して過去に使用された場合の不正件数と真正件数とを、相互に関連付けて構成された情報を、前記事象が過去に発生した確率を算定するために必要な情報として格納し、前記確率算定手段は、前記判定手段にて投入することができないと判定された前記変数カテゴリに対応する前記データ項目の前記データカテゴリに対応する前記不正件数と前記真正件数とを前記事象発生確率格納手段から取得し、当該取得した不正件数と真正件数とに基づいて、前記クレジットカードが当該データカテゴリに対して過去に使用された場合の不正発生率を、前記事象が過去に発生した確率として算定する。 The logistic regression analysis system according to claim 4 is the logistic regression analysis system according to any one of claims 1 to 3, wherein a score corresponding to a probability that fraudulent use of a credit card occurs is calculated as the logistic regression function. A scoring model that calculates the score by inputting the converted value of the variable category of the explanatory variable generated by the expansion means into the scoring model. The event occurrence probability storage means includes information configured by associating the data category with the number of fraud cases and the number of genuine cases when the credit card has been used for the data category in the past. , Stored as information necessary to calculate the probability that the event occurred in the past, and the probability The determination means includes the event occurrence probability storage means that stores the number of fraud cases and the number of genuine cases corresponding to the data category of the data item corresponding to the variable category determined to be unable to be input by the determination means. Based on the acquired number of frauds and genuine cases, the fraud rate when the credit card has been used in the past for the data category is calculated as the probability that the event has occurred in the past To do .

請求項５に記載のロジスティック回帰分析プログラムは、所定の事象が発生する確率に影響を与え得る複数の説明変数の変数カテゴリであって、複数のデータ項目の各々に含まれるデータカテゴリに対応する変数カテゴリを、所定のロジスティック回帰関数に投入することにより前記確率を算定する、ロジスティック回帰分析プログラムであって、前記変数カテゴリに対応する前記データ項目の前記データカテゴリの構成比を格納する構成比格納手段と、前記データカテゴリを対象として、前記事象が過去に発生した確率を算定するために必要な情報を格納する事象発生確率格納手段と、前記複数の説明変数の全体に対する当該複数の説明変数の各々の有効性を示す数値を格納する有効性格納手段と、を備えるコンピュータを、前記説明変数の変数カテゴリを前記ロジスティック回帰関数に投入することができるか否かを判定する判定手段であって、前記ロジスティック回帰関数に投入する説明変数を算定するための所定の複数のデータ項目が所定方法で取得された場合に、当該データ項目の前記データカテゴリの構成比を前記構成比格納手段から取得し、当該取得した構成比が所定値未満であるか否かを判定し、当該取得した構成比が所定値未満である場合には、当該データカテゴリの前記データ項目に対応する当該変数カテゴリを前記ロジスティック回帰関数に投入することができない変数カテゴリであると判定する判定手段と、前記判定手段にて投入することができないと判定された前記変数カテゴリに対応する前記データ項目の前記データカテゴリに対応する情報を前記事象発生確率格納手段から取得し、当該取得した情報に基づいて、前記判定手段にて投入することができないと判定された前記変数カテゴリに対応する前記データ項目の前記データカテゴリを対象として、前記事象が過去に発生した確率を算定する確率算定手段と、前記確率算定手段にて算定された確率を所定倍数で整数化する整数化手段と、前記判定手段にて投入することができないと判定された前記変数カテゴリの説明変数に対応する有効性を示す数値を前記有効性格納手段から取得し、当該取得した有効性を示す数値に基づいて前記整数化手段にて整数化された確率を展開することにより、当該確率の算定の対象となったデータカテゴリに対応する説明変数の変数カテゴリの変換値を生成する展開手段として機能させる。 The logistic regression analysis program according to claim 5 is a variable category of a plurality of explanatory variables that can affect a probability of occurrence of a predetermined event, the variable corresponding to a data category included in each of the plurality of data items. A logistic regression analysis program for calculating the probability by inputting a category into a predetermined logistic regression function, the composition ratio storage means for storing the composition ratio of the data category of the data item corresponding to the variable category And event occurrence probability storage means for storing information necessary for calculating the probability that the event occurred in the past for the data category, and the plurality of explanatory variables for the plurality of explanatory variables as a whole. and validity storage means for storing a numerical value indicating the validity of each of the computers with the explanatory variables A determining unit that determines whether it is possible to introduce a variable categories to the logistic regression function, obtaining a plurality of predetermined data items for calculating the explanatory variables to be introduced into the logistic regression function in a predetermined way If it is, the composition ratio of the data category of the data item is acquired from the composition ratio storage means, it is determined whether the acquired composition ratio is less than a predetermined value, and the acquired composition ratio is predetermined. When the value is less than the value, a determination unit that determines that the variable category corresponding to the data item of the data category is a variable category that cannot be input to the logistic regression function, and the determination unit inputs Information corresponding to the data category of the data item corresponding to the variable category determined to be unusable. Obtained from the probability storage means, based on the acquired information, as an object the data category of the data items corresponding to the variable category is determined not to be able to put in the determination means, said event It is determined that the probability calculation means for calculating the probability of occurrence in the past, the integer calculation means for converting the probability calculated by the probability calculation means into an integer by a predetermined multiple, and the determination means cannot be input. Obtaining a numerical value indicating validity corresponding to the explanatory variable of the variable category from the validity storing means, and developing the probability converted into an integer by the integerizing means based on the acquired numerical value indicating validity. By this, it is made to function as an expansion means for generating the conversion value of the variable category of the explanatory variable corresponding to the data category for which the probability is calculated.

請求項１に記載のロジスティック回帰分析システム、又は請求項５に記載のロジスティック回帰分析プログラムによれば、データ項目における構成比が小さいデータカテゴリに対応する説明変数の変数カテゴリであるために、従来であればロジスティック回帰分析に投入することができなかった変数カテゴリを、変換値に変換してロジスティック回帰分析に投入することができる。従って、従来よりもデータカテゴリを細分化してロジスティック回帰分析を行うことが可能になり、ロジスティック回帰分析の精度を向上させることが可能になる。 According to the logistic regression analysis system according to claim 1 or the logistic regression analysis program according to claim 5, since it is a variable category of explanatory variables corresponding to a data category having a small composition ratio in a data item, If there is a variable category that could not be input to the logistic regression analysis, it can be converted into a converted value and input to the logistic regression analysis. Therefore, it is possible to perform logistic regression analysis by subdividing data categories as compared to the conventional case, and it is possible to improve the accuracy of logistic regression analysis.

請求項２に記載のロジスティック回帰分析システムによれば、データ項目に含まれる最小単位のデータカテゴリの構成比に基づいて、説明変数の変数カテゴリをロジスティック回帰関数に投入することができるか否かを判定するので、データカテゴリを最小単位に細分化してロジスティック回帰分析を行うことが可能になり、ロジスティック回帰分析の精度を一層向上させることが可能になる。 According to the logistic regression analysis system according to claim 2, whether or not the variable category of the explanatory variable can be input to the logistic regression function based on the composition ratio of the data category of the minimum unit included in the data item. Since the determination is made, it becomes possible to perform logistic regression analysis by subdividing the data category into minimum units, and it is possible to further improve the accuracy of the logistic regression analysis.

請求項３に記載のロジスティック回帰分析システムによれば、百分率値として算定した確率を百倍以上の所定倍数で整数化することにより、確率を整数化することができる。 According to the logistic regression analysis system of the third aspect, the probability can be converted to an integer by converting the probability calculated as a percentage value into an integer with a predetermined multiple of 100 times or more.

請求項４に記載のロジスティック回帰分析システムによれば、クレジットカードの不正利用が発生する確率に対応するスコア算定するスコアリングモデルに関して、データ項目における構成比が小さいデータカテゴリに対応する説明変数の変数カテゴリであって、従来であればスコアリングモデルに投入することができなかった変数カテゴリを、変換値に変換してロジスティック回帰分析に投入することができる。従って、従来よりもデータカテゴリを細分化してスコアリングモデルによるスコアの算定を行うことが可能になり、スコアリングモデルの精度を向上させることが可能になる。この結果、クレジットカードの不正利用が発生する確率を一層精度よく判定することが可能になる。 According to the logistic regression analysis system according to claim 4, regarding the scoring model for calculating the score corresponding to the probability of illegal use of the credit card, the variable of the explanatory variable corresponding to the data category having a small composition ratio in the data item A variable category that is a category and could not be input to the scoring model in the past can be converted into a converted value and input to the logistic regression analysis. Therefore, it is possible to calculate the score using the scoring model by subdividing the data category as compared with the conventional case, and it is possible to improve the accuracy of the scoring model. As a result, it becomes possible to determine the probability of unauthorized use of the credit card with higher accuracy.

本発明の一実施の形態に係る不正検知システムを含んで構築された承認システムの電気的構成を機能概念的に示したブロック図である。It is the block diagram which showed the electrical structure of the approval system constructed | assembled including the fraud detection system which concerns on one embodiment of this invention functionally. 会員情報の構成例を示す図である。It is a figure which shows the structural example of member information. 第１利用履歴情報の構成例を示す図である。It is a figure which shows the structural example of 1st usage log information. 第２利用履歴情報の構成例を示す図である。It is a figure which shows the structural example of 2nd usage history information. 不正判定結果情報の構成例を示す図である。It is a figure which shows the structural example of fraud determination result information. スコアリング情報の構成例を示す図である。It is a figure which shows the structural example of scoring information. 不正検知処理のフローチャートであるIt is a flowchart of fraud detection processing 情報蓄積生成処理のフローチャートである。It is a flowchart of an information accumulation | storage production | generation process. 回帰分析処理のフローチャートである。It is a flowchart of a regression analysis process. データ項目に含まれるデータカテゴリに対応する各種の情報を例示した図である。It is the figure which illustrated various information corresponding to the data category contained in a data item. 説明変数変換処理のフローチャートである。It is a flowchart of explanatory variable conversion processing. 説明変数の有効性の設定例を例示した図である。It is the figure which illustrated the example of a setting of the effectiveness of an explanatory variable. 不正判定処理のフローチャートである。It is a flowchart of a fraud determination process. システム管理処理のフローチャートである。It is a flowchart of a system management process.

以下に添付図面を参照して、この発明に係るロジスティック回帰分析システム及びロジスティック回帰分析プログラムの実施の形態を詳細に説明する。 Exemplary embodiments of a logistic regression analysis system and a logistic regression analysis program according to the present invention will be described below in detail with reference to the accompanying drawings.

ロジスティック回帰分析システム及びロジスティック回帰分析プログラムは、所定の事象が発生する確率を算定するためのロジスティック回帰分析を行うものであるが、算定対象とする事象は任意であり、カードの予想不正率、企業の信用リスク、病気の発生確率等に使用することができる。以下では、カードの予想不正率に使用した例について説明するものとし、「ロジスティック回帰分析システム」を「不正検知システム」、「ロジスティック回帰分析プログラム」を「不正検知プログラム」と称する。 Logistic regression analysis system and logistic regression analysis program perform logistic regression analysis to calculate the probability of occurrence of a predetermined event, but the event to be calculated is arbitrary, the expected fraud rate of the card, the company It can be used for credit risk, disease occurrence probability, etc. In the following, an example used for the expected fraud rate of a card will be described. The “logistic regression analysis system” is referred to as a “fraud detection system”, and the “logistic regression analysis program” is referred to as a “fraud detection program”.

（構成）
まず、不正検知システムの構成を説明する。図１は、本実施の形態に係る不正検知システムを含んで構築された承認システムの電気的構成を機能概念的に示したブロック図である。図１に示すように、承認システム１は、カードの利用の承認（オーソリ）を行うためのシステムであり、不正検知システム１０、業務ホスト２０、承認ホスト（オーソリホスト）３０、不正判定端末４０、及びシステム管理端末５０を、インターネットの如きネットワーク６０を介して相互に通信可能に接続して構成されている。 (Constitution)
First, the configuration of the fraud detection system will be described. FIG. 1 is a block diagram functionally conceptually showing an electrical configuration of an approval system constructed including the fraud detection system according to the present embodiment. As shown in FIG. 1, the approval system 1 is a system for approving (authorizing) the use of a card. The fraud detection system 10, the business host 20, the approval host (authorization host) 30, the fraud determination terminal 40, The system management terminal 50 is connected to be communicable with each other via a network 60 such as the Internet.

（構成−不正検知システム）
不正検知システム１０は、カードの不正利用を検知するためのシステムであり、ネットワークインターフェース（以下、ネットワークＩＦ）１１、記憶部１２、及び制御部１３を備えて構成されている。 (Configuration-fraud detection system)
The fraud detection system 10 is a system for detecting unauthorized use of a card, and includes a network interface (hereinafter, network IF) 11, a storage unit 12, and a control unit 13.

ネットワークＩＦ１１は、不正検知システム１０の処理に必要な情報の入力を受け付ける入力手段であると共に、不正検知システム１０から外部に情報を出力するための出力手段であり、例えば、公知のネットワークボードとして構成されている。 The network IF 11 is an input unit that receives input of information necessary for processing of the fraud detection system 10 and is an output unit for outputting information from the fraud detection system 10 to the outside. For example, the network IF 11 is configured as a known network board Has been.

記憶部１２は、不正検知システム１０の処理に必要な各種の情報を記憶する記憶手段であり、例えばハードディスクやその他の記録媒体によって構成されるもので、会員情報ＤＢ（以下、データベースを「ＤＢ」と称する）１２ａ、利用履歴情報ＤＢ１２ｂ、不正判定結果情報ＤＢ１２ｃ、及びスコアリング情報ＤＢ１２ｄを備える。これら各ＤＢの詳細は後述する。 The storage unit 12 is a storage unit that stores various types of information necessary for processing of the fraud detection system 10, and includes, for example, a hard disk or other recording medium. The storage unit 12 is a member information DB (hereinafter referred to as “DB”). 12a), usage history information DB 12b, fraud determination result information DB 12c, and scoring information DB 12d. Details of each DB will be described later.

制御部１３は、不正検知システム１０を制御する制御手段であり、具体的には、ＣＰＵ、当該ＣＰＵ上で解釈実行される各種のプログラム（ＯＳなどの基本制御プログラムや、ＯＳ上で起動され特定機能を実現するアプリケーションプログラムを含む）、及びプログラムや各種のデータを格納するためのＲＡＭの如き内部メモリを備えて構成されるコンピュータである。特に、本実施の形態に係る不正検知プログラムは、コンピュータ読み取り可能な記録媒体に格納され、当該記録媒体から不正検知システム１０にインストールされることで、制御部１３の各部を実質的に構成する。 The control unit 13 is a control unit that controls the fraud detection system 10. Specifically, the control unit 13 is a CPU, various programs that are interpreted and executed on the CPU (a basic control program such as an OS, and a program that is started and specified on the OS. And an internal memory such as a RAM for storing the program and various data. In particular, the fraud detection program according to the present embodiment is stored in a computer-readable recording medium, and is installed in the fraud detection system 10 from the recording medium, thereby substantially configuring each unit of the control unit 13.

この制御部１３は、機能概念的に、情報蓄積生成部１４、回帰分析部１５、不正判定部１６、及びシステム管理部１７を備えている。情報蓄積生成部１４は、記憶部１２に対する各種の情報の蓄積や、各種の情報の生成を行う情報蓄積生成手段である。回帰分析部１５は、ロジスティック回帰分析を適用したスコアリングモデルに基づいてスコアを算定する回帰分析手段である。不正判定部１６は、回帰分析部１５にて算定されたスコアを用いたクレジットカードの不正利用の判定を行うため、必要な処理を行う不正判定手段である。システム管理部１７は、スコアリングモデルの分析や評価に関する処理を行うシステム管理手段である。これら各部の具体的な機能については、後述する。 The control unit 13 includes an information accumulation and generation unit 14, a regression analysis unit 15, a fraud determination unit 16, and a system management unit 17 in terms of functional concept. The information accumulation and generation unit 14 is an information accumulation and generation unit that accumulates various types of information in the storage unit 12 and generates various types of information. The regression analysis unit 15 is a regression analysis unit that calculates a score based on a scoring model to which logistic regression analysis is applied. The fraud determination unit 16 is a fraud determination unit that performs necessary processing to determine fraudulent use of a credit card using the score calculated by the regression analysis unit 15. The system management unit 17 is a system management unit that performs processing related to analysis and evaluation of a scoring model. Specific functions of these units will be described later.

さらに、回帰分析部１５は、判定部１５ａ、確率算定部１５ｂ、整数化部１５ｃ、展開部１５ｄ、及びスコア算定部１５ｅを備える。判定部１５ａは、説明変数の変数カテゴリをロジスティック回帰関数に投入することができるか否かを、当該変数カテゴリに対応するデータ項目のデータカテゴリの構成比に基づいて判定する判定手段である。確率算定部１５ｂは、判定部１５ａにて投入することができないと判定された変数カテゴリに対応するデータ項目のデータカテゴリを対象として、事象が過去に発生した確率を算定する確率算定手段である。整数化部１５ｃは、確率算定部１５ｂにて算定された確率を所定倍数で整数化する整数化手段である。展開部１５ｄは、整数化部１５ｃにて整数化された確率を、判定部１５ａにて投入することができないと判定された変数カテゴリの説明変数の有効性に基づいて展開することにより、当該確率の算定の対象となったデータカテゴリに対応する説明変数の変数カテゴリの変換値を生成する展開手段である。スコア算定部１５ｅは、展開部１５ｄにて生成された説明変数の変数カテゴリの変換値をスコアリングモデルに投入することにより、スコアを算定するスコア算定手段である。これら各部の具体的な機能や用語の意味については、後述する。 Furthermore, the regression analysis unit 15 includes a determination unit 15a, a probability calculation unit 15b, an integerization unit 15c, a development unit 15d, and a score calculation unit 15e. The determination unit 15a is a determination unit that determines whether or not the variable category of the explanatory variable can be input to the logistic regression function based on the composition ratio of the data category of the data item corresponding to the variable category. The probability calculation unit 15b is a probability calculation unit that calculates the probability that an event has occurred in the past for the data category of the data item corresponding to the variable category that is determined to be unable to be input by the determination unit 15a. The integer converting unit 15c is an integer converting unit that converts the probability calculated by the probability calculating unit 15b into an integer with a predetermined multiple. The expansion unit 15d expands the probability converted to an integer by the integer conversion unit 15c based on the validity of the explanatory variable of the variable category determined to be unable to be input by the determination unit 15a. Expansion means for generating a conversion value of the variable category of the explanatory variable corresponding to the data category that is the target of the calculation. The score calculation unit 15e is a score calculation unit that calculates a score by inputting the conversion value of the variable category of the explanatory variable generated by the expansion unit 15d into the scoring model. Specific functions of these parts and meanings of terms will be described later.

次に、記憶部１２の各ＤＢの詳細について説明する。ただし、各ＤＢに格納される情報として以下で説明する情報は、あくまで例示であり、実際には、一部を省略することができ、他の情報を含めることができ、あるいは他の情報で置換することができる。 Next, details of each DB in the storage unit 12 will be described. However, the information described below as information stored in each DB is merely an example, and in practice, some of the information can be omitted, other information can be included, or replaced with other information. can do.

図１の会員情報ＤＢ１２ａは、カードの正当な利用者として登録された者（以下、会員）に関する情報（以下、会員情報）を格納する会員情報格納手段である。この会員情報は、図２の構成例に示すように、項目「利用者会員番号」、項目「カード総供与」、項目「カードキャッシング総供与」、項目「利用者生年月日」、項目「利用者性別」、項目「カード有効期限」、項目「利用者入会年月日」等の各項目と、これら各項目に対応するデータを、相互に対応付けて構成されている。項目「利用者会員番号」に対応する情報は、各会員に一意に付与された会員番号である。項目「カード総供与」に対応する情報は、各会員のカードの総利用可能額である。項目「カードキャッシング総供与」に対応する情報は、各会員のカードによるキャッシングの総利用可能額である。項目「利用者生年月日」に対応する情報は、各会員の生年月日である。項目「利用者性別」に対応する情報は、各会員の性別であり、例えば「０」は男性、「１」は女性を示す。項目「カード有効期限」に対応する情報は、各会員のカードの有効期限である。項目「利用者入会年月日」に対応する情報は、各会員がカードに入会した年月日である。 The member information DB 12a in FIG. 1 is member information storage means for storing information (hereinafter referred to as member information) regarding a person (hereinafter referred to as a member) registered as a valid user of the card. As shown in the configuration example of FIG. 2, this member information includes an item “user member number”, an item “total card grant”, an item “total card cashing”, an item “birth date of user”, and an item “use” Each item such as “personality”, item “card expiration date”, item “user membership date” and the like, and data corresponding to each item are associated with each other. Information corresponding to the item “user member number” is a member number uniquely assigned to each member. The information corresponding to the item “total card grant” is the total available card amount of each member. The information corresponding to the item “total card cashing” is the total available amount of cashing by each member's card. The information corresponding to the item “user's date of birth” is the date of birth of each member. The information corresponding to the item “user gender” is the gender of each member. For example, “0” indicates male and “1” indicates female. The information corresponding to the item “card expiration date” is the expiration date of each member's card. The information corresponding to the item “user membership date” is the date on which each member joined the card.

図１の利用履歴情報ＤＢ１２ｂは、カードの利用履歴に関する情報（以下、利用履歴情報）を格納する利用履歴情報格納手段である。この利用履歴情報は、本実施の形態では、第１利用履歴情報と第２利用履歴情報を含む。第１利用履歴情報は、承認ホスト３０から送信された承認情報（オーソリ情報）を含んで構成されるもので、図３の構成例に示すように、項目「一意キー」、項目「利用者会員番号」、項目「承認受付年月日」、項目「承認受付時刻」、項目「利用金額」等の各項目と、これら各項目に対応するデータを、相互に対応付けて構成されている。項目「一意キー」を除いた各項目に対応する情報は、承認ホスト３０から送信された承認情報に含まれる情報である。項目「一意キー」に対応する情報は、制御部１３によって自動的に一意に採番された情報であって、第１利用履歴情報の主キーになる情報である。項目「利用者会員番号」に対応する情報は、各会員に一意に付与された会員番号である。項目「承認受付年月日」に対応する情報は、承認が受け付けられた年月日、項目「承認受付時刻」に対応する情報は、承認が受け付けられた時刻、項目「利用金額」に対応する情報は、承認の対象となっている金額である。なお、承認ホスト３０から送信された承認情報には、これら各情報以外にも各種の情報（例えば、後述する具体的における商品コード）が含まれるが、本実施の形態においてはその説明を省略する。 The usage history information DB 12b in FIG. 1 is usage history information storage means for storing information relating to card usage history (hereinafter referred to as usage history information). In the present embodiment, this usage history information includes first usage history information and second usage history information. The first usage history information includes approval information (authorization information) transmitted from the approval host 30. As shown in the configuration example of FIG. 3, the item “unique key” and the item “user member” Each item such as “number”, item “approval reception date”, item “approval reception time”, item “use amount”, and the data corresponding to each item are associated with each other. Information corresponding to each item excluding the item “unique key” is information included in the approval information transmitted from the approval host 30. Information corresponding to the item “unique key” is information that is automatically and uniquely assigned by the control unit 13 and is information that serves as a primary key of the first usage history information. Information corresponding to the item “user member number” is a member number uniquely assigned to each member. The information corresponding to the item “approval acceptance date” corresponds to the date when the approval was accepted, the information corresponding to the item “approval acceptance time” corresponds to the time when the approval was accepted, and the item “use amount” The information is the amount that is subject to approval. Note that the approval information transmitted from the approval host 30 includes various types of information (for example, specific product codes to be described later) in addition to these pieces of information, but the description thereof is omitted in the present embodiment. .

また、第２利用履歴情報は、カードの利用履歴に統計的処理を施して生成された情報を含んで構成されるもので、図４の構成例に示すように、項目「利用者会員番号」、項目「承認受付年月日」、項目「承認受付時刻」、項目「過去１回前からの経過時間」、項目「過去ｎ回前からの経過時間」、項目「過去１回前利用時間」、項目「過去ｎ回前利用時間」、項目「過去１回前利用チャネル」、項目「過去ｎ回前利用チャネル」等の各項目と、これら各項目に対応するデータを、相互に対応付けて構成されている。項目「利用者会員番号」、項目「承認受付年月日」、及び項目「承認受付時刻」に対応する情報は、図３の第１利用履歴情報における同一項目に対応する情報と同じである。項目「過去１回前からの経過時間」に対応する情報は、承認の対象となっている会員が、過去１回前のカード利用を行ってから今回のカード利用を行うまでの経過時間である。項目「過去ｎ回前からの経過時間」に対応する情報は、承認の対象となっている会員が、過去ｎ回前のカード利用を行ってから今回のカード利用を行うまでの経過時間である。なお、「ｎ」は整数であり、図示は省略するが、第２利用履歴情報は、１とｎの間の各整数ｘ（＝２、３、・・・、ｎ−１）に対応する項目「過去ｘ回前からの経過時間」と、この項目に対応する情報を含んで構成されている（以下、ｎを用いて表現された他の情報についても同じ）。項目「過去１回前利用時間」に対応する情報は、承認の対象となっている会員が、過去１回前のカード利用を行った時間である。項目「過去ｎ回前利用時間」に対応する情報は、承認の対象となっている会員が、過去ｎ回前のカード利用を行った時間である。項目「過去１回前利用チャネル」に対応する情報は、承認の対象となっている会員が、過去１回前のカード利用を行ったチャネルである。チャネルとは、カードの利用経路であり、例えば、国内又は海外のいずれか一方と対面又は非対面のいずれか一方とを組わせて特定され、「対面国内」、「対面海外」、「非対面国内」、「非対面海外」の４つのチャネルがある。特定される。項目「過去ｎ回前利用チャネル」に対応する情報は、承認の対象となっている会員が、過去ｎ回前のカード利用を行ったチャネルである。 The second usage history information includes information generated by performing statistical processing on the card usage history. As shown in the configuration example of FIG. 4, the item “user member number” is used. , Item “approval reception date”, item “approval reception time”, item “elapsed time since previous 1 time”, item “elapsed time since previous n times”, item “use time before last” , The item “last n times before use time”, the item “past last use channel”, the item “past n times use channel”, and the data corresponding to each item are associated with each other. It is configured. Information corresponding to the item “user member number”, the item “approval reception date”, and the item “approval reception time” is the same as the information corresponding to the same item in the first usage history information of FIG. The information corresponding to the item “elapsed time from the previous past” is the elapsed time from the use of the previous card by the member who is the object of approval to the current use of the card. . The information corresponding to the item “elapsed time from the past n times” is the elapsed time from when the member who is the object of approval uses the card n times before in the past to use this card. . Note that “n” is an integer and is not shown, but the second usage history information is an item corresponding to each integer x (= 2, 3,..., N−1) between 1 and n. “Elapsed time since previous x times” and information corresponding to this item are included (hereinafter, the same applies to other information expressed using n). The information corresponding to the item “last past use time” is the time when the member who is the object of approval has used the card one time before the past. The information corresponding to the item “last n times before use” is the time when the member who is the object of approval has used the card n times before in the past. The information corresponding to the item “past last use channel” is a channel in which a member who has been approved has used a card one time before in the past. A channel is a card usage route, for example, specified by combining either domestic or overseas with either facing or non-facing, "facing domestic", "facing overseas", "non-facing" There are four channels: “Domestic” and “Non-face-to-face overseas”. Identified. The information corresponding to the item “past use channel n times in the past” is a channel in which the member who has been approved has used the card n times in the past in the past.

図１の不正判定結果情報ＤＢ１２ｃは、カードの不正判定の結果に関する情報（以下、不正判定結果情報）を格納する不正判定結果情報格納手段である。この不正判定結果情報は、図５の構成例に示すように、項目「利用者会員番号」、項目「承認受付年月日」、項目「承認受付時刻」、項目「ヒットルールＩＤ」、項目「スコア」等の各項目と、これら各項目に対応するデータを、相互に対応付けて構成されている。項目「利用者会員番号」、項目「承認受付年月日」、及び項目「承認受付時刻」に対応する情報は、図３の第１利用履歴情報における同一項目に対応する情報と同じである。項目「ヒットルールＩＤ」に対応する情報は、カードの不正判定を行うためのルールのうち、各承認の情報に合致したルール（ヒットルール）を一意に特定するためのルール特定情報である。項目「スコア」に対応する情報は、スコアリングモデルを用いて算定されたスコアである。なお、実際には、この不正判定結果情報には、さらに、承認ホスト３０から受信した承認情報、会員情報ＤＢ１２ａから取得された会員情報、及び利用履歴情報ＤＢ１２ｂから取得された後述する第１分析対象履歴情報及び第２分析対象履歴情報が含められるが、これら各情報は図５では図示の便宜上省略する。 The fraud determination result information DB 12c in FIG. 1 is fraud determination result information storage means for storing information related to the card fraud determination result (hereinafter, fraud determination result information). As shown in the configuration example of FIG. 5, the fraud determination result information includes an item “user member number”, an item “approval reception date”, an item “approval reception time”, an item “hit rule ID”, an item “ Each item such as “score” and data corresponding to each item are associated with each other. Information corresponding to the item “user member number”, the item “approval reception date”, and the item “approval reception time” is the same as the information corresponding to the same item in the first usage history information of FIG. The information corresponding to the item “hit rule ID” is rule specifying information for uniquely specifying a rule (hit rule) that matches each approval information among the rules for determining fraud of the card. Information corresponding to the item “score” is a score calculated using a scoring model. In practice, the fraud determination result information further includes the approval information received from the approval host 30, the member information acquired from the member information DB 12a, and the first analysis target described later acquired from the usage history information DB 12b. The history information and the second analysis target history information are included, but these pieces of information are omitted in FIG. 5 for convenience of illustration.

図１のスコアリング情報ＤＢ１２ｄは、スコアリングモデルを用いた不正判定に関する情報（以下、スコアリング情報）を格納するスコアリング情報格納手段である。このスコアリング情報は、図６の構成例に示すように、項目「利用者会員番号」、項目「承認受付年月日」、項目「承認受付時刻」、項目「説明変数１」、項目「説明変数ｎ」、項目「スコア」等の各項目と、これら各項目に対応するデータを、相互に対応付けて構成されている。項目「利用者会員番号」、項目「承認受付年月日」、及び項目「承認受付時刻」に対応する情報は、図３の第１利用履歴情報における同一項目に対応する情報と同じである。項目「説明変数１」に対応する情報は、スコアリングモデルに投入された１番目の説明変数である。項目「説明変数ｎ」に対応する情報は、スコアリングモデルに投入されたｎ番目の説明変数である。項目「スコア」に対応する情報は、スコアリングモデルを用いて算定されたスコアである。 The scoring information DB 12d in FIG. 1 is a scoring information storage unit that stores information related to fraud determination using a scoring model (hereinafter, scoring information). As shown in the configuration example of FIG. 6, the scoring information includes an item “user member number”, an item “approval reception date”, an item “approval reception time”, an item “explanation variable 1”, and an item “explanation”. Each item such as “variable n” and item “score” and data corresponding to each item are associated with each other. Information corresponding to the item “user member number”, the item “approval reception date”, and the item “approval reception time” is the same as the information corresponding to the same item in the first usage history information of FIG. The information corresponding to the item “explanatory variable 1” is the first explanatory variable input to the scoring model. Information corresponding to the item “explanatory variable n” is the nth explanatory variable input to the scoring model. Information corresponding to the item “score” is a score calculated using a scoring model.

（構成−業務ホスト）
次に、図１の業務ホスト２０について説明する。この業務ホスト２０は、会員情報の管理を行うホストコンピュータであり、図２に示した会員情報を格納しており、この会員情報を必要に応じて不正検知システム１０に送信する。 (Configuration-Business host)
Next, the business host 20 in FIG. 1 will be described. The business host 20 is a host computer that manages member information, stores the member information shown in FIG. 2, and transmits this member information to the fraud detection system 10 as necessary.

（構成−承認ホスト）
図１の承認ホスト３０は、店舗等に設定された図示しないカード端末から照会のために送信された承認情報を受信して処理するホストコンピュータであり、この承認情報等を必要に応じて不正検知システム１０に送信する。 (Configuration-Approval host)
The approval host 30 in FIG. 1 is a host computer that receives and processes approval information transmitted for inquiry from a card terminal (not shown) set in a store or the like, and detects this fraudulent information as necessary. Send to system 10.

（構成−不正判定端末）
図１の不正判定端末４０は、不正判定業務を担当する担当者の端末であり、例えば、カード会社に設置される端末であって、公知のパーソナルコンピュータと同様に構成されている。 (Configuration-fraud determination terminal)
The fraud determination terminal 40 of FIG. 1 is a terminal of a person in charge of fraud determination work, for example, a terminal installed in a card company, and is configured in the same manner as a known personal computer.

（構成−システム管理端末）
図１のシステム管理端末５０は、不正検知システム１０の管理を担当する担当者の端末であり、例えば、システム会社に設置される端末であって、公知のパーソナルコンピュータと同様に構成されている。 (Configuration-system management terminal)
A system management terminal 50 in FIG. 1 is a terminal of a person in charge of managing the fraud detection system 10, and is a terminal installed in a system company, for example, and is configured in the same manner as a known personal computer.

（処理）
次に、このように構成された承認システム１によって実行される不正検知処理について説明する。図７は、不正検知システム１０が実行する不正検知処理のフローチャートである（以下の各処理の説明ではステップを「Ｓ」と略記する）。この不正検知処理は、例えば不正検知システム１０の起動後に繰り返して起動させるもので、情報蓄積生成処理（ＳＡ１）、回帰分析処理（ＳＡ２）、不正判定処理（ＳＡ３）、及びシステム管理処理（ＳＡ４）を順次実行する。以下、これら各処理について順次説明する。 (processing)
Next, a fraud detection process executed by the approval system 1 configured as described above will be described. FIG. 7 is a flowchart of the fraud detection process executed by the fraud detection system 10 (steps are abbreviated as “S” in the following description of each process). This fraud detection process is, for example, repeatedly started after the fraud detection system 10 is started. The information accumulation generation process (SA1), regression analysis process (SA2), fraud determination process (SA3), and system management process (SA4) Are executed sequentially. Hereinafter, each of these processes will be described sequentially.

（処理−情報蓄積生成処理）
最初に、情報蓄積生成処理について説明する。この処理は、各種の情報の蓄積と生成を行う処理であり、情報蓄積生成部１４によって実行される。図８は、情報蓄積生成処理のフローチャートである。業務ホスト２０に対する会員情報の新規登録や更新が公知の手段で行われると、業務ホスト２０はこの会員情報を不正検知システム１０に送信する。不正検知システム１０の情報蓄積生成部１４は、会員情報を業務ホスト２０から受信すると（ＳＢ１、Ｙｅｓ）、新規登録された会員情報を会員情報ＤＢ１２ａに追加し、あるいは更新された会員情報を会員情報ＤＢ１２ａに反映させる（ＳＢ２）。 (Processing-Information accumulation generation processing)
First, the information accumulation generation process will be described. This process is a process for storing and generating various types of information, and is executed by the information storage generation unit 14. FIG. 8 is a flowchart of the information accumulation generation process. When new registration or update of member information with respect to the business host 20 is performed by a known means, the business host 20 transmits this member information to the fraud detection system 10. When receiving the member information from the business host 20 (SB1, Yes), the information accumulation / generation unit 14 of the fraud detection system 10 adds the newly registered member information to the member information DB 12a or updates the member information to the member information. Reflected in the DB 12a (SB2).

また、カードが店舗等で利用されると、店舗等に設置された図示しないカード端末は、承認ホスト３０に対して、承認情報と共に、与信残高と不正利用の照会要求を送信する。この照会要求を受けた承認ホスト３０は、承認情報を不正検知システム１０に送信する。不正検知システム１０の情報蓄積生成部１４は、承認情報を承認ホスト３０から受信すると（ＳＢ３、Ｙｅｓ）、この承認情報に含まれる利用者会員番号に対応する会員情報を会員情報ＤＢ１２ａから取得する（ＳＢ４）。 When the card is used in a store or the like, a card terminal (not shown) installed in the store or the like transmits a credit balance and an unauthorized use inquiry request to the approval host 30 together with the approval information. Upon receiving this inquiry request, the approval host 30 transmits approval information to the fraud detection system 10. When receiving the approval information from the approval host 30 (SB3, Yes), the information accumulation / generation unit 14 of the fraud detection system 10 acquires the member information corresponding to the user member number included in the approval information from the member information DB 12a ( SB4).

次いで、情報蓄積生成部１４は、承認情報に基づいて、利用履歴情報ＤＢ１２ｂの第１利用履歴情報を蓄積する（ＳＢ５）。例えば、所定方法で一意キーを採番し、この一意キーを主キーとして承認情報に付加することによって第１利用履歴情報を生成し、この第１利用履歴情報を利用履歴情報ＤＢ１２ｂに追加する。 Next, the information accumulation generation unit 14 accumulates the first usage history information in the usage history information DB 12b based on the approval information (SB5). For example, a unique key is numbered by a predetermined method, and the first usage history information is generated by adding the unique key to the approval information as a primary key, and the first usage history information is added to the usage history information DB 12b.

また、情報蓄積生成部１４は、承認情報に含まれる利用者会員番号に対応する全ての第１利用履歴情報を利用履歴情報ＤＢ１２ｂから取得した後、この第１利用履歴情報に基づいて、不正利用のスコアリングに使用する履歴情報（以下、第１分析対象履歴情報）を生成する（ＳＢ６）。第１分析対象履歴情報は、例えば、直近５分以内の利用回数、直近３０分以内の利用回数、直近５分以内の利用合計金額、直近３０分以内の利用合計金額、直近５分以内の利用金額平均、直近３０分以内の利用金額平均等を含んで構成されるもので、４つのチャネル毎に生成される。 In addition, after acquiring all the first usage history information corresponding to the user member number included in the approval information from the usage history information DB 12b, the information accumulation and generation unit 14 performs unauthorized use based on the first usage history information. History information used for scoring (hereinafter referred to as first analysis target history information) is generated (SB6). The first analysis target history information is, for example, the number of usages within the last 5 minutes, the number of usages within the last 30 minutes, the total usage amount within the last 5 minutes, the total usage amount within the last 30 minutes, and the usage within the last 5 minutes. It is configured to include the average amount, the average usage amount within the last 30 minutes, etc., and is generated for each of the four channels.

また、情報蓄積生成部１４は、承認情報に含まれる利用者会員番号を主キーとし、承認情報に含まれる承認受付年月日及び承認受付時刻を含む第２利用履歴情報を生成し、この第２利用履歴情報を利用履歴情報ＤＢ１２ｂに蓄積する（ＳＢ７）。この際、第２利用履歴情報の他の情報（過去１回前からの経過時間等）は、利用履歴情報ＤＢ１２ｂにそれまでに蓄積された第１利用履歴情報であって、承認情報に含まれる利用者会員番号に対応する第１利用履歴情報を参照することによって生成される。これにて情報蓄積生成処理を終了する。 The information accumulation and generation unit 14 generates second usage history information including the approval reception date and the approval reception time included in the approval information using the user member number included in the approval information as a main key. 2 The usage history information is stored in the usage history information DB 12b (SB7). At this time, other information of the second usage history information (e.g., the elapsed time from the previous one) is the first usage history information accumulated so far in the usage history information DB 12b and is included in the approval information. It is generated by referring to the first usage history information corresponding to the user member number. This completes the information storage generation process.

（処理−回帰分析処理）
次に、図７の回帰分析処理について説明する。この処理は、ロジスティック回帰分析を適用して構成されたスコアリングモデルを用いてスコアを算定する処理であり、回帰分析部１５及びその各部によって実行される。図９は、回帰分析処理のフローチャートである。回帰分析部１５は、図８のＳＢ３で受信した承認情報、ＳＢ４で取得した会員情報、及びＳＢ６で生成した第１分析対象履歴情報、及びＳＢ７で生成した第２利用履歴情報の中から、スコアリングモデルに投入する説明変数を算定するための所定の複数のデータ項目を取得する（ＳＣ１）。例えば、データ項目としては、承認受付時刻、利用金額、商品コード等がある。 (Processing-regression analysis processing)
Next, the regression analysis process of FIG. 7 will be described. This process is a process of calculating a score using a scoring model configured by applying logistic regression analysis, and is executed by the regression analysis unit 15 and each unit thereof. FIG. 9 is a flowchart of the regression analysis process. The regression analysis unit 15 calculates the score from the approval information received at SB3 in FIG. 8, the member information acquired at SB4, the first analysis target history information generated at SB6, and the second usage history information generated at SB7. A plurality of predetermined data items for calculating explanatory variables to be input to the ring model are acquired (SC1). For example, the data items include approval acceptance time, usage amount, product code, and the like.

そして、回帰分析部１５の判定部１５ａは、説明変数の変数カテゴリをロジスティック回帰関数に投入することができるか否かを、当該変数カテゴリに対応するデータ項目のデータカテゴリの構成比に基づいて判定する（ＳＣ２）。「データカテゴリ」とは、データ項目に含まれる情報であり、データ項目の内部における配置順序に意味を持たない情報であって、主としてセット型のデータである。例えば、データカテゴリとしては、データ項目である「カード番号（クレジットカードを一意に特定するための番号）」に含まれる各カード番号、データ項目である「加盟店コード（クレジットカードが使用された店舗等を一意に特定するためのコード）」に含まれる各加盟店コード、データ項目である「業種コード（クレジットカードが使用された店舗等の業種を一意に特定するためのコード）」に含まれる各業種コード、データ項目である「商品コード（クレジットカードを使用して購入等された商品を一意に特定するためのコード）」に含まれる各商品コードが該当する。また、「構成比」とは、各データカテゴリが属するデータ項目に含まれる全てのデータカテゴリに対して、当該各データカテゴリが占める比率である。例えば、この構成比は、過去の履歴に基づいて予め特定し、不正判定端末４０を介して記憶部１２に予め記憶させておくことができる。説明変数の変数カテゴリをロジスティック回帰関数に投入することができるか否かは、例えば、各データカテゴリの構成比が、記憶部１２に予め設定された所定値未満（一例として、１％未満）である場合に（各データカテゴリの構成比が小さい場合に）、投入することができないと判定する。特に、ここでは、判定部１５ａは、データ項目に含まれる最小単位のデータカテゴリ別に、判定を行う。ここで、「最小単位でのデータカテゴリ」とは、それ以上細分化することができないデータカテゴリであり、図１０の例では、各商品コードが、最小単位でのデータカテゴリに該当する。そして、判定部１５ａは、ロジスティック回帰関数に投入することができないと判定した変数カテゴリがある場合には、当該変数カテゴリに対応するデータ項目のデータカテゴリを対象として、説明変数変換処理を起動する（ＳＣ３）。 Then, the determination unit 15a of the regression analysis unit 15 determines whether or not the variable category of the explanatory variable can be input to the logistic regression function based on the composition ratio of the data category of the data item corresponding to the variable category. (SC2). The “data category” is information included in the data item, information that has no meaning in the arrangement order inside the data item, and is mainly set-type data. For example, as a data category, each card number included in a data item “card number (a number for uniquely identifying a credit card)” and data item “member store code (a store where a credit card is used) Etc.) ”and each data item“ Industry code (code for uniquely identifying the industry such as the store where the credit card was used) ” This corresponds to each product code included in each business type code and data item “product code (code for uniquely identifying a product purchased using a credit card)”. The “composition ratio” is a ratio of each data category to all data categories included in the data item to which each data category belongs. For example, this component ratio can be specified in advance based on the past history and stored in the storage unit 12 in advance via the fraud determination terminal 40. Whether or not the variable category of the explanatory variable can be input to the logistic regression function is, for example, that the composition ratio of each data category is less than a predetermined value preset in the storage unit 12 (for example, less than 1%). In some cases (when the composition ratio of each data category is small), it is determined that it cannot be input. In particular, here, the determination unit 15a performs determination for each data category of the minimum unit included in the data item. Here, the “data category in the minimum unit” is a data category that cannot be further subdivided, and in the example of FIG. 10, each product code corresponds to a data category in the minimum unit. Then, when there is a variable category determined to be unable to be input to the logistic regression function, the determination unit 15a starts the explanatory variable conversion process for the data category of the data item corresponding to the variable category ( SC3).

図１０には、データ項目に含まれるデータカテゴリに対応する各種の情報を示す。この図１０の例では、データ項目が「商品コード」であり、データカテゴリ（最小単位のデータカテゴリ）が各商品コードである場合を示す。この例では、データカテゴリである商品コード＝１００１の構成比は３５．０１％、データカテゴリである商品コード＝１００２の構成比は０．１８％であり、データカテゴリである商品コードの全ての構成比の合計値は１００％である。そして、上記所定値を１％とする場合、図１０の例では、データカテゴリである商品コード＝１００２〜１００５、３００９の構成比が当該所定値未満であるため、これらデータカテゴリである商品コード＝１００２〜１００５、３００９に対応する変数カテゴリが、ロジスティック回帰関数にそのまま投入することができない変数カテゴリであると判定されることになる。そして、このような変数カテゴリが存在する場合には、当該変数カテゴリに対応するデータカテゴリを対象として、説明変数変換処理を起動する。 FIG. 10 shows various types of information corresponding to the data category included in the data item. In the example of FIG. 10, the data item is “product code” and the data category (minimum unit data category) is each product code. In this example, the composition ratio of the product code = 1001 as the data category is 35.01%, the composition ratio of the product code = 1002 as the data category is 0.18%, and all the composition of the commodity codes as the data category The total ratio is 100%. When the predetermined value is 1%, in the example of FIG. 10, since the composition ratio of the product codes = 1002 to 1005, 3009 as the data category is less than the predetermined value, the product code = the data category = It is determined that the variable categories corresponding to 1002 to 1005 and 3009 are variable categories that cannot be directly input to the logistic regression function. If such a variable category exists, the explanatory variable conversion process is started for the data category corresponding to the variable category.

図１１は、説明変数変換処理のフローチャートである。回帰分析部１５の確率算定部１５ｂは、判定部１５ａにて投入することができないと判定された変数カテゴリに対応するデータ項目のデータカテゴリを対象として、事象が過去に発生した確率を算定する（ＳＤ１）。「事象が過去に発生した確率」とは、スコアリングモデルに適用したロジスティック回帰分析が算定する確率であり、本実施の形態においては、過去に不正が発生した確率（以下、不正発生率）である。この確率は、各ＤＢに記憶されている情報に基づいて算定する。図１０の例では、データカテゴリである商品コード＝１００１に対する過去の承認件数が、不正件数＝１００であり、真正件数＝１００，０００であるため、不正発生率＝（１００／（１００＋１００，０００））×１００＝０．１０％が算定される（なお、図１０の例では、小数点第３桁以降を四捨五入した例を示すが、端数処理は任意に変更可能である）。 FIG. 11 is a flowchart of the explanatory variable conversion process. The probability calculation unit 15b of the regression analysis unit 15 calculates the probability that an event has occurred in the past with respect to the data category of the data item corresponding to the variable category determined to be unable to be input by the determination unit 15a ( SD1). The “probability that an event has occurred in the past” is the probability calculated by the logistic regression analysis applied to the scoring model. In this embodiment, the probability that fraud has occurred in the past (hereinafter referred to as the fraud rate). is there. This probability is calculated based on information stored in each DB. In the example of FIG. 10, since the number of past approvals for the product code = 1001 as the data category is the number of fraud cases = 100 and the number of genuine cases = 100,000, the fraud occurrence rate = (100 / (100 + 100,000) ) × 100 = 0.10% is calculated (in the example of FIG. 10, an example in which the third digit after the decimal point is rounded off is shown, but the fractional processing can be arbitrarily changed).

次いで、回帰分析部１５の整数化部１５ｃは、このように算出した確率を所定数倍することで整数化する（ＳＤ２）。「所定数倍」とは、任意に設定することができるが、百分率値である確率を整数化する観点からは少なくとも１００倍以上であることが好ましく、さらに、小さい確率を有効化するためには、さらに大きな倍数（例えば、１０００倍や１００００倍）することが好ましい。例えば、この倍率は、不正判定端末４０を介して記憶部１２に予め記憶させておくことができる。図１０の例では、このように整数化した値を「整数比」として例示する。この図１０では、所定数倍＝１０００とした例を示しており、上記商品コードの不正発生率＝０．１０％は、０．１０％＝（１／１０００）×１０００＝１として整数化されている（なお、図１０の例では、小数点第１桁以降を切り捨てた例を示すが、端数処理は任意に変更可能である）。 Next, the integerization unit 15c of the regression analysis unit 15 multiplies the probability thus calculated by a predetermined number to make an integer (SD2). The “predetermined multiple” can be arbitrarily set, but is preferably at least 100 times or more from the viewpoint of converting the probability of being a percentage value into an integer, and further, in order to validate a small probability Further, it is preferable to make a larger multiple (for example, 1000 times or 10,000 times). For example, this magnification can be stored in advance in the storage unit 12 via the fraud determination terminal 40. In the example of FIG. 10, the value converted into an integer is exemplified as an “integer ratio”. FIG. 10 shows an example in which the predetermined number of times is set to 1000, and the fraud occurrence rate of the product code = 0.10% is converted into an integer as 0.10% = (1/1000) × 1000 = 1. (Note that the example in FIG. 10 shows an example in which the first digit after the decimal point is truncated, but the fractional processing can be arbitrarily changed).

そして、回帰分析部１５の展開部１５ｄは、整数化した確率を展開して変換値を生成する（ＳＤ３）。「展開する」とは、確率を拡大することを意味する。この展開は、１）判定部１５ａにて投入することができないと判定された変数カテゴリの説明変数の有効性であって、複数の説明変数の全体に対する当該説明変数の有効性に基づいて、かつ、２）尺度を維持するように行われる。「有効性」とは、所定の事象が発生する確率に対して、各説明変数がどの程度影響しているのかを表す度合（百分率値）であり、全ての説明変数の有効性の合計値が１００％となるように設定された数値である。図１２には、説明変数の有効性の設定例を例示する。この有効性は、例えば、不正検知システム１０の管理者が過去の不正利用履歴等を参照して任意の方法で決定し、不正判定端末４０を介して記憶部１２に予め記憶させておくことができる。「尺度」とは、整数化した確率の数値の相対的な大きさであり、上記の例ではデータカテゴリである商品コードの不正発生率の相対的な大きさである。このような条件を満たすため、例えば、整数化した確率に対して、当該確率に対応する説明変数の有効性に対応する数値を乗算することにより、上記展開を行う。例えば、図１２の例では、データ項目である商品コードに対応する説明変数の有効性＝１２％であるため、展開比率＝１．２に設定する。従って、図１０に「変換値」として例示するように、整数化した不正発生率＝１、１０、３２、８３３に対応する変換値が、それぞれ１（＝１×１．２）、１２（＝１０×１．２）、３８（＝３２×１．２）、９９９（＝８３３×１．２）と算定される（なお、図１０の例では、小数点第１桁以降を切り捨てた例を示すが、端数処理は任意に変更可能である）。また、この展開の際には、数値範囲が所定範囲を超えないように、整数化した確率の最小値と最大値を求めてその数値範囲を特定し、最小値や最大値が所定値を超えないように、展開比率を決定する。図１０の例では、整数化した不正発生率の最小値＝０、最大値＝８３３であることから、数値範囲＝０〜８３３となり、最大値＝８３３が所定値である９９９を超えないように、展開を行っている。このような処理を行うことで、例えば、不正率＝０．０１％を変換値＝１に展開し、この変換値をそのままスコアリングモデルに投入することが可能になる。これにて図１１の説明変数変換処理を終了し、図９の回帰分析処理に戻る。 Then, the expansion unit 15d of the regression analysis unit 15 expands the integerized probability to generate a converted value (SD3). “Deploy” means to expand the probability. This expansion is 1) the effectiveness of the explanatory variable of the variable category determined not to be input by the determining unit 15a, based on the effectiveness of the explanatory variable with respect to the whole of the plurality of explanatory variables, and 2) done to maintain a scale. “Effectiveness” is the degree (percentage value) that indicates how much each explanatory variable affects the probability that a given event will occur, and the total effectiveness of all explanatory variables is It is a numerical value set to be 100%. FIG. 12 illustrates an example of setting the effectiveness of explanatory variables. For example, the validity may be determined by an administrator of the fraud detection system 10 by an arbitrary method with reference to past fraudulent use history and the like, and stored in the storage unit 12 via the fraud determination terminal 40 in advance. it can. The “scale” is the relative size of the numerical value of the probability obtained as an integer, and in the above example, is the relative size of the fraud rate of the product code that is the data category. In order to satisfy such a condition, for example, the expansion is performed by multiplying the integerized probability by a numerical value corresponding to the validity of the explanatory variable corresponding to the probability. For example, in the example of FIG. 12, since the effectiveness of the explanatory variable corresponding to the product code as the data item is 12%, the expansion ratio is set to 1.2. Therefore, as illustrated as “conversion value” in FIG. 10, conversion values corresponding to integer fraud occurrence rates = 1, 10, 32, and 833 are 1 (= 1 × 1.2) and 12 (= 10 × 1.2), 38 (= 32 × 1.2), and 999 (= 833 × 1.2) (Note that the example in FIG. 10 shows an example in which the first decimal place is truncated). However, the rounding process can be changed arbitrarily). Also, during this development, the minimum and maximum values of the probability converted to integers are determined so that the numerical range does not exceed the predetermined range, the numerical range is specified, and the minimum or maximum value exceeds the predetermined value. Determine the deployment ratio so that there is no. In the example of FIG. 10, since the minimum value of the fraud occurrence rate converted to an integer is 0 and the maximum value is 833, the numerical value range is 0 to 833 so that the maximum value = 833 does not exceed the predetermined value 999. , Is doing the deployment. By performing such processing, for example, it is possible to develop fraud rate = 0.01% into conversion value = 1 and to input this conversion value as it is into the scoring model. This completes the explanatory variable conversion process of FIG. 11 and returns to the regression analysis process of FIG.

なお、このように生成した変換値に対して、必要に応じて、展開部１５ｄが、数値型データに対する公知の説明変数加工プロセスをさらに適用して、変換値の再変換を行うことで再変換値を算定し、この再変換値をスコアリングモデルに投入してもよい。このような公知のプロセスとしては、丸め処理やセット型データへの再変換処理等を挙げることができる。丸め処理とは、数値を、一定の丸め幅の整数倍の数値に置換する処理である。また、セット型データへの再変換処理とは、数値データをセット型データに置換する処理である。このような処理を行って得られた再変換値を、図１０に例示する。この例では、丸め処理とセット型データへの再変換処理の両方を適用して得られた再変換値を示しており、変換値の０と１を、０又は１のいずれかに丸めた上で、セット型データである再変換値Ａに置換している。同様に、変換値１２、３９、４６を、それぞれセット型データである再変換値Ｂ、Ｃ、Ｄに置換している。 Note that the conversion unit 15d re-converts the conversion value generated in this way by re-converting the conversion value by further applying a known explanatory variable processing process for the numerical data as necessary. A value may be calculated and this retransformed value may be entered into the scoring model. Examples of such known processes include rounding and re-conversion processing to set type data. The rounding process is a process of replacing a numerical value with a numerical value that is an integral multiple of a certain rounding width. The re-conversion processing to set type data is processing for replacing numerical data with set type data. The reconverted value obtained by performing such processing is illustrated in FIG. In this example, a reconverted value obtained by applying both the rounding process and the reconverting process to the set type data is shown, and the converted values 0 and 1 are rounded to either 0 or 1. Thus, it is replaced with the reconverted value A which is set type data. Similarly, the converted values 12, 39, and 46 are replaced with reconverted values B, C, and D, which are set type data, respectively.

このように説明変数変換処理が終了した後、図９の回帰分析処理において、回帰分析部１５のスコア算定部１５ｅは、説明変数変換処理の対象にならなかった変数カテゴリ（構成比が小さくないデータカテゴリに対応する変数カテゴリ）と、説明変数変換処理によって生成された変換値や再変換値（構成比が小さいデータカテゴリに対応する変数カテゴリに対応する変換値や再変換値）とを、ロジスティック回帰分析を適用して構築された所定のスコアリングモデルに投入することにより、予想不正率（スコア）を算定する（ＳＣ４）。ただし、スコアリングモデルとしては公知のモデルを使用することができるので、その詳細な説明は省略する。これにて回帰分析処理を終了する。 After the explanatory variable conversion process is completed in this way, in the regression analysis process of FIG. 9, the score calculation unit 15e of the regression analysis unit 15 sets the variable category (data whose composition ratio is not small) that is not the target of the explanatory variable conversion process. Logistic regression between the variable category corresponding to the category) and the converted value and reconverted value generated by the explanatory variable conversion process (converted value and reconverted value corresponding to the variable category corresponding to the data category with a small composition ratio) The expected fraud rate (score) is calculated by inputting it into a predetermined scoring model constructed by applying the analysis (SC4). However, since a known model can be used as the scoring model, detailed description thereof is omitted. This completes the regression analysis process.

（処理−不正判定処理）
次に、図７の不正判定処理について説明する。この処理は、スコアリングモデルを用いて算定されたスコア等を参照し、承認要求が行われたカードの利用が不正利用であるか否かを判定する処理であり、不正判定部１６によって実行される。図１３は、不正判定処理のフローチャートである。 (Processing-fraud determination processing)
Next, the fraud determination process in FIG. 7 will be described. This process refers to a score calculated using a scoring model and determines whether or not the use of the card for which the approval request has been made is an unauthorized use, and is executed by the fraud determination unit 16. The FIG. 13 is a flowchart of the fraud determination process.

不正判定部１６は、図８のＳＢ３で受信した承認情報、ＳＢ４で取得した会員情報、ＳＢ６で生成した第１分析対象履歴情報及び第２分析対象履歴情報、及び図９のＳＣ３で算定したスコアが、所定のルールに合致するか否かを判定する（ＳＥ１）。「ルール」とは、不正利用の可能性が高いパターンを示す判定基準であり、例えば、不正検知システム１０の担当者によって複数のルールが設定されて予め記憶部１２に記憶されている。一例としては、「スコア＝６００点以上、利用金額＝１０万円以上、過去１回前利用金額＝１０万円以上、かつ、過去１回前からの経過時間＝５分以内」というルールが設定されており、不正判定部１６は、上記取得した各情報に照らして当該ルールに含まれる全ての条件が満たされるか否かを判定し、満たされる場合には、当該ルールに合致すると判定される（このように合致すると判定されたルールを、ヒットルールと称する）。 The fraud determination unit 16 receives the approval information received in SB3 in FIG. 8, the member information acquired in SB4, the first analysis target history information and the second analysis target history information generated in SB6, and the score calculated in SC3 in FIG. Determines whether or not a predetermined rule is met (SE1). The “rule” is a determination criterion indicating a pattern having a high possibility of unauthorized use. For example, a plurality of rules are set by a person in charge of the fraud detection system 10 and stored in the storage unit 12 in advance. As an example, the rule “score = 600 points or more, usage amount = 100,000 yen or more, past use amount = 100,000 yen or more, and elapsed time from previous one = less than 5 minutes” is set. The fraud determination unit 16 determines whether or not all the conditions included in the rule are satisfied in light of each piece of the acquired information. If satisfied, the fraud determination unit 16 determines that the rule is satisfied. (The rule determined to match in this way is referred to as a hit rule).

そして、所定の複数のルールのうち、いずれか一つ以上のルールに合致すると判定された場合（ＳＥ２、Ｙｅｓ）、不正判定部１６は、照会要求が行われている承認を保留する旨を、承認ホスト３０に送信する（ＳＥ３）。次いで、不正判定部１６は、図８のＳＢ３で受信した承認情報、ＳＢ４で取得した会員情報、ＳＢ６で生成した第１分析対象履歴情報、及びＳＢ７で生成した第２利用履歴情報、及び図９のＳＣ４で算定したスコアと、ＳＥ１で判定されたヒットルールを一意に識別するヒットルールＩＤとを含む情報であって、承認情報の利用者会員番号を主キーとする不正判定結果情報を生成して不正判定結果情報ＤＢ１２ｃに蓄積する（ＳＥ４）。 When it is determined that any one or more of the plurality of predetermined rules are matched (SE2, Yes), the fraud determination unit 16 holds that the approval for which the inquiry request is made is suspended, It transmits to the approval host 30 (SE3). Next, the fraud determination unit 16 receives the approval information received in SB3 of FIG. 8, the member information acquired in SB4, the first analysis target history information generated in SB6, the second usage history information generated in SB7, and FIG. Information including the score calculated in SC4 and the hit rule ID for uniquely identifying the hit rule determined in SE1, and generating fraud determination result information using the user member number of the approval information as a primary key Is stored in the fraud determination result information DB 12c (SE4).

次いで、不正判定部１６は、上記生成した不正判定結果情報を不正判定端末４０に送信する（ＳＥ５）。この結果、不正判定端末４０に設けたモニタに不正判定結果情報が出力されるので、業務担当者は、この不正判定結果情報を参照することで、照会が要求されているカードの利用が不正利用であるか否かを最終的に判断し、この不正判定結果を不正判定端末４０を介して不正検知システム１０に送信する。不正判定部１６は、不正判定結果を受信すると（ＳＥ６、Ｙｅｓ）、この不正判定結果を承認ホスト３０に送信して（ＳＥ７）、不正判定処理を終了する。 Next, the fraud determination unit 16 transmits the generated fraud determination result information to the fraud determination terminal 40 (SE5). As a result, the fraud determination result information is output to the monitor provided in the fraud determination terminal 40, so that the person in charge of the operation refers to the fraud determination result information, and the use of the card for which the inquiry is requested is illegally used. The fraud determination result is transmitted to the fraud detection system 10 via the fraud determination terminal 40. Upon receiving the fraud determination result (SE6, Yes), the fraud determination unit 16 transmits the fraud determination result to the approval host 30 (SE7), and ends the fraud determination process.

（処理−システム管理処理）
最後に、図７のシステム管理処理について説明する。この処理は、スコアリングモデルの精度の分析や評価を行い、必要に応じてスコアリングモデルの更新を行う処理であり、システム管理部１７によって実行される。図１４は、システム管理処理のフローチャートである。 (Processing-System management processing)
Finally, the system management process of FIG. 7 will be described. This process is a process of analyzing and evaluating the accuracy of the scoring model and updating the scoring model as necessary, and is executed by the system management unit 17. FIG. 14 is a flowchart of the system management process.

システム管理部１７は、図８のＳＢ３で受信した承認情報、図９のＳＣ３でスコアリングモデルに投入された説明変数、及び図９のＳＣ４で算定したスコアを含む情報であって、承認情報の利用者会員番号を主キーとするスコアリング情報を生成してスコアリング情報ＤＢ１２ｄに蓄積する（ＳＦ１）。そして、システム管理部１７は、システム管理端末５０からの要求に応じて（ＳＦ２、Ｙｅｓ）、このスコアリング情報をスコアリング情報ＤＢ１２ｄから取得してシステム管理端末５０に送信する（ＳＦ３）。この結果、システム管理端末５０に設けたモニタにスコアリング情報が出力されるので、システム管理担当者は、このスコアリング情報を参照することで、スコアリングモデルの精度を所定方法で解析する。そして、システム管理担当者は、この精度のレポート作成を定期的に行うと共に、スコアリングモデルが劣化している場合には、システム管理端末５０から不正検知システム１０に更新データを送信することで、スコアリングモデルの各パラメータを変更し、スコアリングモデルの精度の維持及び向上を図る。例えば、図１１のＳＤ２における整数化の倍率を見直したり、図１２に例示した各説明変数の有効性を見直す。システム管理部１７は、システム管理端末５０から送信された更新データに基づいて、スコアリングモデルの更新等を行う（ＳＦ５）。これにてシステム管理処理を終了する。 The system management unit 17 is information including the approval information received in SB3 in FIG. 8, the explanatory variable input to the scoring model in SC3 in FIG. 9, and the score calculated in SC4 in FIG. Scoring information having the user member number as a primary key is generated and stored in the scoring information DB 12d (SF1). Then, in response to a request from the system management terminal 50 (SF2, Yes), the system management unit 17 acquires this scoring information from the scoring information DB 12d and transmits it to the system management terminal 50 (SF3). As a result, the scoring information is output to the monitor provided in the system management terminal 50, and the system manager in charge analyzes the accuracy of the scoring model by a predetermined method by referring to the scoring information. Then, the system manager in charge periodically creates a report of this accuracy, and when the scoring model is deteriorated, by sending update data from the system management terminal 50 to the fraud detection system 10, Each parameter of the scoring model is changed to maintain and improve the accuracy of the scoring model. For example, the scaling factor for integerization in SD2 of FIG. 11 is reviewed, or the effectiveness of each explanatory variable illustrated in FIG. 12 is reviewed. The system management unit 17 updates the scoring model based on the update data transmitted from the system management terminal 50 (SF5). This completes the system management process.

（実施の形態に対する変形例）
以上、本発明に係る実施の形態について説明したが、本発明の具体的な構成及び手段は、特許請求の範囲に記載した各発明の技術的思想の範囲内において、任意に改変及び改良することができる。以下、このような変形例について説明する。 (Modification to the embodiment)
Although the embodiments of the present invention have been described above, the specific configuration and means of the present invention can be arbitrarily modified and improved within the scope of the technical idea of each invention described in the claims. Can do. Hereinafter, such a modification will be described.

（変形例−分散や統合について）
また、上述した各電気的構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各部の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成できる。例えば、不正検知システム１０の機能の一部を承認ホスト３０や不正判定端末４０に持たせてもよく、あるいは、不正検知システム１０の機能を複数台のコンピュータに分散して持たせてもよい。また、回帰分析処理の一部として説明した処理を、他の処理で行う等、各処理の一部を相互に入れ替えてもよい。 (Modification-About distribution and integration)
Further, each of the electrical components described above is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each part is not limited to the one shown in the figure, and all or a part thereof may be functionally or physically distributed / integrated in arbitrary units according to various loads and usage conditions. Can be configured. For example, a part of the functions of the fraud detection system 10 may be provided to the approval host 30 or the fraud determination terminal 40, or the functions of the fraud detection system 10 may be distributed to a plurality of computers. Moreover, you may mutually replace a part of each process, such as performing the process demonstrated as a part of regression analysis process by another process.

（説明変数変換処理について）
上記実施の形態では、最小単位のデータカテゴリの構成比に基づいて判定を行っているが、この構成比は、細分化したい単位に応じたデータカテゴリの構成比であればよい。例えば、最小単位のデータカテゴリではなくても、従来よりも細分化されたデータカテゴリであって、従来よりも分析精度を向上させることができる程度に細分化されたデータカテゴリの構成比を採用することができる。また、データ項目毎に細分化すべき単位を予め設定しておき、この単位を参照して、各データ項目毎に異なる単位で判定を行ってもよい。 (Explanation variable conversion process)
In the above embodiment, the determination is made based on the composition ratio of the data category in the minimum unit, but this composition ratio may be the composition ratio of the data category corresponding to the unit to be subdivided. For example, even if the data category is not the smallest unit, it is a data category that is subdivided compared to the conventional one, and the composition ratio of the data category that is subdivided to such an extent that the analysis accuracy can be improved than the conventional one is adopted. be able to. In addition, a unit to be subdivided for each data item may be set in advance, and the determination may be performed in a different unit for each data item with reference to this unit.

１承認システム
１０不正検知システム
１１ネットワークＩＦ
１２記憶部
１２ａ会員情報ＤＢ
１２ｂ利用履歴情報ＤＢ
１２ｃ不正判定結果情報ＤＢ
１２ｄスコアリング情報ＤＢ
１３制御部
１４情報蓄積生成部
１５回帰分析部
１５ａ変換対象判定部
１５ｂ確率算定部
１５ｃ整数化部
１５ｄ展開部
１５ｅスコア算定部
１６不正判定部
１７システム管理部
２０業務ホスト
３０承認ホスト
４０不正判定端末
５０システム管理端末
６０ネットワーク 1 Approval System 10 Fraud Detection System 11 Network IF
12 storage unit 12a member information DB
12b Usage history information DB
12c Fraud determination result information DB
12d Scoring information DB
DESCRIPTION OF SYMBOLS 13 Control part 14 Information accumulation | storage part 15 Regression analysis part 15a Conversion object determination part 15b Probability calculation part 15c Integer part 15d Expansion part 15e Score calculation part 16 Fraud determination part 17 System management part 20 Business host 30 Approval host 40 Fraud determination terminal 50 System management terminal 60 Network

Claims

A variable category corresponding to a data category included in each of a plurality of data items, which can affect the probability of occurrence of a predetermined event, is input to a predetermined logistic regression function. A logistic regression analysis system for calculating the probability by
A component ratio storage means for storing a component ratio of the data category of the data item corresponding to the variable category;
For the data category, event occurrence probability storage means for storing information necessary for calculating the probability that the event has occurred in the past;
Effectiveness storage means for storing a numerical value indicating the effectiveness of each of the plurality of explanatory variables with respect to the whole of the plurality of explanatory variables;
A determination means for determining whether or not a variable category of the explanatory variable can be input to the logistic regression function, wherein a plurality of predetermined data items for calculating the explanatory variable to be input to the logistic regression function are When obtained by a predetermined method, the composition ratio of the data category of the data item is obtained from the composition ratio storage means, and it is determined whether or not the obtained composition ratio is less than a predetermined value. When the composition ratio is less than a predetermined value, a determination unit that determines that the variable category corresponding to the data item of the data category is a variable category that cannot be input to the logistic regression function ;
The information corresponding to the data category of the data item corresponding to the variable category determined to be unable to be input by the determination unit is acquired from the event occurrence probability storage unit, and based on the acquired information The probability calculation means for calculating the probability that the event has occurred in the past for the data category of the data item corresponding to the variable category determined not to be input by the determination means;
An integerization means for converting the probability calculated by the probability calculation means into an integer by a predetermined multiple;
The numerical value indicating the validity corresponding to the explanatory variable of the variable category determined to be unable to be input by the determination means is acquired from the validity storage means, and based on the acquired numerical value indicating the validity Expansion means for generating a converted value of the variable category of the explanatory variable corresponding to the data category for which the probability is calculated by expanding the probability converted into an integer by the integer conversion means ;
Logistic regression analysis system with

The component ratio storage means stores a component ratio of a data category of a minimum unit included in the data item,
The determination means acquires the composition ratio of the data category of the minimum unit included in the data item from the composition ratio storage means, determines whether the acquired composition ratio is less than a predetermined value, and acquires the When the composition ratio is less than a predetermined value, the variable category corresponding to the data item of the data category is determined as a variable category that cannot be input to the logistic regression function .
The logistic regression analysis system according to claim 1.

The probability calculating means calculates the probability that the event has occurred in the past as a percentage value,
The integer converting means converts the probability calculated by the probability calculating means into an integer with a predetermined multiple of 100 times,
The logistic regression analysis system according to claim 1 or 2.

A system that calculates a score corresponding to the probability of unauthorized use of a credit card by a scoring model to which the logistic regression function is applied,
A score calculation unit that calculates the score by inputting a converted value of the variable category of the explanatory variable generated by the expansion unit into the scoring model ;
The event occurrence probability storage means includes information obtained by associating the data category with the number of fraud cases and the number of genuine cases when the credit card has been used for the data category in the past. Store as information necessary to calculate the probability that the event occurred in the past,
The probability calculating means calculates the event occurrence probability by calculating the number of frauds and the number of genuine cases corresponding to the data category of the data item corresponding to the variable category determined to be unimportable by the determination means. Based on the number of frauds and authenticity obtained from the storage means, the probability of fraud when the credit card has been used in the past for the data category, and the probability that the event has occurred in the past Calculate as
The logistic regression analysis system according to any one of claims 1 to 3.

A variable category corresponding to a data category included in each of a plurality of data items, which can affect the probability of occurrence of a predetermined event, is input to a predetermined logistic regression function. A logistic regression analysis program for calculating the probability by
A component ratio storage means for storing a component ratio of the data category of the data item corresponding to the variable category;
For the data category, event occurrence probability storage means for storing information necessary for calculating the probability that the event has occurred in the past;
Effectiveness storage means for storing a numerical value indicating the effectiveness of each of the plurality of explanatory variables with respect to the whole of the plurality of explanatory variables;
A computer comprising
A determination means for determining whether or not a variable category of the explanatory variable can be input to the logistic regression function, wherein a plurality of predetermined data items for calculating the explanatory variable to be input to the logistic regression function are When obtained by a predetermined method, the composition ratio of the data category of the data item is obtained from the composition ratio storage means, and it is determined whether or not the obtained composition ratio is less than a predetermined value. When the composition ratio is less than a predetermined value, a determination unit that determines that the variable category corresponding to the data item of the data category is a variable category that cannot be input to the logistic regression function ;
The information corresponding to the data category of the data item corresponding to the variable category determined to be unable to be input by the determination unit is acquired from the event occurrence probability storage unit, and based on the acquired information The probability calculation means for calculating the probability that the event has occurred in the past for the data category of the data item corresponding to the variable category determined not to be input by the determination means;
An integerization means for converting the probability calculated by the probability calculation means into an integer by a predetermined multiple;
The numerical value indicating the validity corresponding to the explanatory variable of the variable category determined to be unable to be input by the determination means is acquired from the validity storage means, and based on the acquired numerical value indicating the validity Expansion means for generating a converted value of the variable category of the explanatory variable corresponding to the data category for which the probability is calculated by expanding the probability converted into an integer by the integer conversion means ;
Logistic regression analysis program to function as.