JP6711519B2

JP6711519B2 - Evaluation device, evaluation method and program

Info

Publication number: JP6711519B2
Application number: JP2016126008A
Authority: JP
Inventors: 平松　直人; 直人平松
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2016-06-24
Filing date: 2016-06-24
Publication date: 2020-06-17
Anticipated expiration: 2036-06-24
Also published as: JP2017228255A

Description

本発明は、匿名化した情報を評価する評価装置、評価方法及びこれを実現するためのプログラムに関する。 The present invention relates to an evaluation device for evaluating anonymized information, an evaluation method, and a program for realizing the same.

近年、ＩＴ技術の発展により、個人情報が非常に漏洩し易い状況となっている。このため、個人情報の保護の重要性が叫ばれており、個人情報に対して、個人の識別を困難にする加工処理を施すことが提案されている。このような加工処理の１つとして、ｋ匿名化処理が知られている。 In recent years, due to the development of IT technology, personal information is very easily leaked. Therefore, the importance of protecting personal information is being emphasized, and it has been proposed that the personal information be processed to make it difficult to identify the individual. A k-anonymization process is known as one of such processing processes.

ｋ匿名化処理では、対象となるデータ内に、同じ属性を持つデータがｋ件以上存在するようにデータが加工される。例えば、郵便番号、性別、年齢、趣味を項目とする個人データが存在する場合に、ｋ＝３が設定されているとする。この場合に、ｋ匿名化処理が実行されると、各項目が全て一致する個人の数がｋ＝３人以上となるように、郵便番号の下数桁を削除したり、年齢を切り上げたり、といったデータの加工が行なわれている。 In the k-anonymization process, data is processed so that there are k or more data having the same attribute in the target data. For example, it is assumed that k=3 is set when personal data having items such as postal code, sex, age, and hobby exists. In this case, when the k anonymization process is executed, the lower digits of the postal code are deleted or the age is rounded up so that the number of individuals whose items all match is k=3 or more. Is being processed.

これに関連し、例えば、特許文献１では、データベースに対して匿名化処理を施した際に、過度な情報損失が生じているか否かを判定し、判定した判定結果によって、情報有用性を算出することが開示されている。 In relation to this, for example, in Patent Document 1, it is determined whether excessive information loss occurs when anonymization processing is performed on the database, and the information utility is calculated based on the determined determination result. It is disclosed to do.

また、特許文献２では、個人情報を含む匿名化データから個人が一意に特定されるリスクレベルを分析する情報匿名化システムが開示されている。開示された情報匿名化システムは、匿名化データを構成するレコードごとに、リスクを定量的に分析し、分析されたレコードごとのリスクに基づいて、特定の尺度に従って匿名化データのリスクレベルを算出し、算出されたリスクレベルを出力する。 In addition, Patent Document 2 discloses an information anonymization system that analyzes a risk level at which an individual is uniquely identified from anonymized data including personal information. The disclosed information anonymization system quantitatively analyzes the risk for each record that makes up the anonymized data, and calculates the risk level of the anonymized data according to a specific measure based on the risk of each analyzed record. Then, the calculated risk level is output.

特開２０１３−１９０８３８号公報JP, 2013-190838, A 特開２０１５−１７６４９６号公報JP, 2005-176496, A

ここで、ｋの値を高く設定する程、個人情報の漏洩リスクを低減することになるが、個人情報を利用する際の有用性は低下することになる。一方、ｋの値を低く設定する程、個人情報を利用する際の有用性は高くなるが、個人情報の漏洩リスクは高まることになる。 Here, as the value of k is set higher, the risk of leakage of personal information is reduced, but the usefulness when using personal information is reduced. On the other hand, the lower the value of k is set, the more useful the personal information is, but the higher the risk of leakage of the personal information.

特許文献１では、匿名化した個人の匿名化データに対し、過度に情報損失が生じているか否かの判定を行い、匿名化した個人情報の有用性の算出を行っているが、個人情報の漏洩リスクを妨げる点については考慮されていない。 In Patent Document 1, it is determined whether information is excessively lost with respect to anonymized individual anonymized data, and the usefulness of the anonymized personal information is calculated. No consideration is given to the point of preventing the leakage risk.

特許文献２では、個人情報の漏洩リスクについてリスク分析装置を用い、リスクレベルを分析することで個人情報の漏洩リスクを解決しているが、過剰に漏洩リスクを防止すると情報の有用性が失われてしまうおそれがある、という点については考慮されていない。 In Patent Document 2, a risk analyzer is used to analyze the risk of leakage of personal information, and the risk of personal information is solved by analyzing the risk level. However, if the risk of leakage is excessively prevented, the usefulness of information is lost. It does not take into consideration the possibility that it may occur.

そこで、情報有用性と漏洩リスクの双方を評価し、ｋの値を適正な値に設定することが求められる。 Therefore, it is required to evaluate both information usefulness and leakage risk and set the value of k to an appropriate value.

本発明の目的の一例は、上記問題点を解消し、ｋ匿名化処理を行なったデータに対する評価を行ない得る、評価装置、評価方法及びプログラムを提供することにある。 An example of the object of the present invention is to provide an evaluation device, an evaluation method, and a program that can solve the above-mentioned problems and can evaluate the data subjected to the k-anonymization process.

上記目的を達成するため、本発明の一側面における評価装置は、複数の個人から取得された個人情報を含むデータを対象データとして評価を行なう評価装置であって、
前記対象データから前記個人情報の取得元の個人の存在が把握される可能性を、個人識別リスクとして評価する、個人識別リスク評価部と、
前記対象データから前記個人情報の取得元の個人が特定される可能性を、個人特定リスクとして評価する、個人特定リスク評価部と、
前記対象データに対して匿名化処理が行なわれた場合の有用性を評価する、有用性評価部と、
を備えていることを特徴とする。 In order to achieve the above object, an evaluation device according to one aspect of the present invention is an evaluation device that evaluates data including personal information acquired from a plurality of individuals as target data,
Possibility of grasping the existence of the individual from which the personal information is acquired from the target data, evaluating as a personal identification risk, a personal identification risk evaluation unit,
The possibility that the individual from whom the personal information is acquired is identified from the target data, is evaluated as an individual identification risk, and an individual identification risk evaluation unit,
A usefulness evaluation unit that evaluates usefulness when anonymization processing is performed on the target data,
It is characterized by having.

また、上記目的を達成するため、本発明の一側面における評価方法は、複数の個人から取得された個人情報を含むデータを対象データとして評価を行なう評価方法であって、
（ａ）前記対象データから前記個人情報の取得元の個人の存在が把握される可能性を、個人識別リスクとして評価する、ステップと、
（ｂ）前記対象データから前記個人情報の取得元の個人が特定される可能性を、個人特定リスクとして評価する、ステップと、
（Ｃ）前記対象データに対して匿名化処理が行なわれた場合の有用性を評価するする、ステップと、を有することを特徴とする。 Further, in order to achieve the above object, an evaluation method according to one aspect of the present invention is an evaluation method in which data including personal information acquired from a plurality of individuals is evaluated as target data,
(A) evaluating the possibility that the existence of an individual who has acquired the personal information from the target data is known as a personal identification risk;
(B) evaluating the possibility that an individual from whom the personal information is acquired from the target data is identified as an individual identification risk;
(C) a step of evaluating the usefulness when the anonymization process is performed on the target data.

また、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータによって、複数の個人から取得された個人情報を含むデータを対象データとして評価を行なうためのプログラムであって、
前記コンピュータに、
（ａ）前記対象データから前記個人情報の取得元の個人の存在が把握される可能性を、個人識別リスクとして評価する、ステップと、
（ｂ）前記対象データから前記個人情報の取得元の個人が特定される可能性を、個人特定リスクとして評価する、ステップと、
（Ｃ）前記対象データに対して匿名化処理が行なわれた場合の有用性を評価する、ステップと、
を実行させることを、を特徴とする。 In order to achieve the above object, the program according to one aspect of the present invention is
A program for evaluating data including personal information obtained from a plurality of individuals as target data by a computer,
On the computer,
(A) evaluating the possibility that the existence of an individual who has acquired the personal information from the target data is known as a personal identification risk;
(B) evaluating the possibility that an individual from whom the personal information is acquired from the target data is identified as an individual identification risk;
(C) a step of evaluating the usefulness when anonymization processing is performed on the target data,
Is carried out.

以上のように、本発明によれば、ｋ匿名化処理を行なったデータに対する評価を行なうことができる。 As described above, according to the present invention, it is possible to evaluate the data subjected to the k-anonymization process.

図１は、本発明の実施形態１に係る評価装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an evaluation device according to the first embodiment of the present invention. 図２は、本実施の形態における評価装置を具体的に示すブロック図である。FIG. 2 is a block diagram specifically showing the evaluation device according to the present embodiment. 図３は、本発明の実施の形態における評価装置の動作を示すフロー図である。FIG. 3 is a flowchart showing the operation of the evaluation device according to the embodiment of the present invention. 図４は、本発明の実施形態において得られた個人識別リスクの評価の一例を示す図である。FIG. 4 is a diagram showing an example of the evaluation of the personal identification risk obtained in the embodiment of the present invention. 図５は、本発明の実施形態で行なわれる個人識別リスクの評価処理の一例を説明する図であり、図５（ａ）〜（ｃ）は一連の処理の流れを示している。FIG. 5 is a diagram illustrating an example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 5A to 5C show a series of process flows. 図６は、本発明の実施形態で行なわれる個人識別リスクの評価処理の他の例を説明する図であり、図６（ａ）〜（ｄ）は一連の処理の流れを示している。FIG. 6 is a diagram illustrating another example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 6A to 6D show a series of processes. 図７は、本発明の実施形態で行なわれる個人識別リスクの評価処理の他の例を説明する図であり、図７（ａ）〜（ｅ）は一連の処理の流れを示している。FIG. 7 is a diagram illustrating another example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 7A to 7E show a series of processes. 図８は、本発明の実施の形態において行なわれる個人特定リスクの評価処理を説明するための図であり、個人情報の一例を示している。FIG. 8 is a diagram for explaining an individual identification risk evaluation process performed in the embodiment of the present invention, and shows an example of personal information. 図９は、本発明の実施の形態で行なわれた有用性の評価の一例を示す図である。FIG. 9 is a diagram showing an example of evaluation of usefulness performed in the embodiment of the present invention. 図１０は、本発明の実施の形態における評価装置を実現するコンピュータの一例を示すブロック図である。FIG. 10 is a block diagram showing an example of a computer that realizes the evaluation device according to the embodiment of the present invention.

（実施の形態）
以下、本発明の実施の形態における評価装置、評価方法及びプログラムについて、図１〜図１０を参照しながら説明する。 (Embodiment)
Hereinafter, an evaluation device, an evaluation method, and a program according to an embodiment of the present invention will be described with reference to FIGS.

［装置構成］
最初に、本実施の形態における評価装置の概略構成について図１を用いて説明する。図１は、本発明の実施形態１に係る評価装置の概略構成を示すブロック図である。 [Device configuration]
First, a schematic configuration of the evaluation device according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a schematic configuration of an evaluation device according to the first embodiment of the present invention.

図１に示す本実施の形態における評価装置１０は、複数の個人の個人情報を含むデータを対象データとして評価を行なう装置である。図１に示すように、本実施形態における評価装置１０は、個人識別リスク評価部１１と、個人特定リスク評価部１２と、有用性評価部１３とを備えている。 The evaluation apparatus 10 according to the present embodiment shown in FIG. 1 is an apparatus that evaluates data including personal information of a plurality of individuals as target data. As shown in FIG. 1, the evaluation device 10 in the present embodiment includes an individual identification risk evaluation unit 11, an individual identification risk evaluation unit 12, and a usefulness evaluation unit 13.

個人識別リスク評価部１１は、対象データから個人情報の取得元の個人の存在が把握される可能性を、個人識別リスクとして評価する。個人特定リスク評価部１２は、対象データから個人情報の取得元の個人が特定される可能性を、個人特定リスクとして評価する。有用性評価部１３は、対象データに対して匿名化処理が行なわれた場合の有用性を評価する。 The personal identification risk evaluation unit 11 evaluates the possibility that the existence of the individual who acquired the personal information from the target data is recognized as the personal identification risk. The individual identification risk evaluation unit 12 evaluates, as an individual identification risk, the possibility that an individual from whom the personal information is acquired is identified from the target data. The usefulness evaluation unit 13 evaluates the usefulness when the anonymization process is performed on the target data.

ここで、「識別」とは、誰かひとりの情報がわかることと定義する。「特定」とは、誰の情報であるかがわかることと定義する。「識別」は、「特定」よりも広義の意である。「特定」されているならば、当然に「識別」されていることとなる。ｋ匿名化は、「特定」を防止するために、「識別」を困難にする技術であると言える。 Here, “identification” is defined as the fact that the information of each person is known. “Specific” is defined as knowing who the information is. “Identification” has a broader meaning than “identification”. If "identified", then naturally "identified". It can be said that k anonymization is a technique that makes “identification” difficult in order to prevent “identification”.

このように、本実施の形態では、データは、個人識別リスク、個人特定リスク、有用性の三点において評価される。本実施の形態によれば、ｋ匿名化処理を行なったデータに対する評価が可能となる。 As described above, in the present embodiment, the data is evaluated based on three points of individual identification risk, individual identification risk, and usefulness. According to the present embodiment, it is possible to evaluate the data that has undergone k-anonymization processing.

続いて、図２を参照し、本実施の形態における評価システム１について更に具体的に説明する。図２は、本実施の形態における評価装置を具体的に示すブロック図である。 Subsequently, the evaluation system 1 according to the present embodiment will be described more specifically with reference to FIG. FIG. 2 is a block diagram specifically showing the evaluation device according to the present embodiment.

図２に示すように、本実施の形態においては、評価装置１０には、個人情報を管理するデータベース２０と、評価者が利用する端末装置３０とがネットワーク等を介して接続されている。 As shown in FIG. 2, in the present embodiment, a database 20 that manages personal information and a terminal device 30 used by an evaluator are connected to the evaluation device 10 via a network or the like.

データベース２０は、個人情報を格納している。また、データベース２０は、匿名化された個人情報を格納していても良い。個人情報は、例えば、住所、氏名、電話番号、年齢、国籍といった、個人を特定する可能性を備えた準識別子を有しており、複数の準識別子で構成されている。また、匿名化は、例えば、設定されたレベルの値がＡである場合に、準識別子が共通する個人がＡ人存在するように、準識別子の内容を変更することによって行なわれる。 The database 20 stores personal information. Further, the database 20 may store anonymized personal information. The personal information has a quasi-identifier having a possibility of identifying an individual, such as an address, a name, a telephone number, an age and a nationality, and is composed of a plurality of quasi-identifiers. Further, the anonymization is performed, for example, by changing the content of the quasi-identifier so that when the value of the set level is A, there are A individuals who share the quasi-identifier.

また、図２に示すように、本実施の形態においては、評価装置１０は、上述した個人識別リスク評価部１１、個人特定リスク評価部１２、及び有用性評価部１３に加えて、データ取得部１４とデータ出力部１５とを備えている。 In addition, as shown in FIG. 2, in the present embodiment, the evaluation device 10 includes a data acquisition unit in addition to the individual identification risk evaluation unit 11, the individual identification risk evaluation unit 12, and the usefulness evaluation unit 13 described above. 14 and a data output unit 15.

データ取得部１４は、データベース２０から、個人情報又は匿名化された個人情報を対象データとして取得する。データ出力部１５は、個人識別リスク評価部１１から得られた個人識別リスクと、個人特定リスク評価部１２から得られた個人特定リスクと、有用性評価部１３から得られた有用性とを、端末装置３０に送信する。これにより、端末装置３０の画面上には、個人情報の各リスクと有用性とが表示される。 The data acquisition unit 14 acquires personal information or anonymized personal information from the database 20 as target data. The data output unit 15 indicates the individual identification risk obtained from the individual identification risk evaluation unit 11, the individual identification risk obtained from the individual identification risk evaluation unit 12, and the usefulness obtained from the usefulness evaluation unit 13, It is transmitted to the terminal device 30. As a result, each risk and usefulness of personal information is displayed on the screen of the terminal device 30.

個人識別リスク評価部１１は、本実施の形態では、対象データ中の個人情報を構成する準識別子の値が一致するレコードの個数を算出し、算出した個数から、対象データに対してｋ匿名化処理を実行した場合のｋ人に識別される人数を求めることによって、個人識別リスクを評価する。つまり、個人識別リスク評価部１１は、準識別子から識別できるレコード数をカウントし、誰かのレコードに識別できるレコードがいくつ存在するかを、個人識別リスクとして算出する。 In the present embodiment, the personal identification risk evaluation unit 11 calculates the number of records in which the values of the quasi-identifiers forming the personal information in the target data match, and from the calculated number, the anonymization of the target data is performed. The individual identification risk is evaluated by obtaining the number of people identified by k when the process is executed. That is, the individual identification risk evaluation unit 11 counts the number of records that can be identified from the quasi-identifiers, and calculates the number of identifiable records in someone's record as the individual identification risk.

個人特定リスク評価部１２は、本実施の形態では、個人識別リスク評価部１２によって得られた個人識別リスクと、準識別子毎に予め設定された、各準識別子から個人が特定される危険性を示す係数とに基づいて、個人特定リスクを評価する。つまり、個人特定リスク評価部１２は、個人識別の危険性の評価結果に対して、更に攻撃者がどの程度準識別子をしっているかを考慮して、個人特定リスクを算出する。 In the present embodiment, the individual identification risk evaluation unit 12 determines the individual identification risk obtained by the individual identification risk evaluation unit 12 and the risk that an individual is identified from each quasi-identifier preset for each quasi-identifier. The individual identification risk is evaluated based on the indicated coefficient. That is, the individual identification risk evaluation unit 12 calculates the individual identification risk in consideration of how much the attacker uses the quasi-identifier with respect to the evaluation result of the risk of individual identification.

有用性評価部１３は、本実施の形態では、対象データのレコード数と、対象データに対して匿名化処理が行なわれた場合の対象データのレコード数とを用いて、有用性を評価する。具体的には、有用性評価部１３は、対象データの匿名化前のレコード数と、匿名化後のレコード数とを比較し、匿名化によって、どの程度のレコードが削除されたかを評価する。 In this embodiment, the usefulness evaluation unit 13 evaluates the usefulness by using the number of records of the target data and the number of records of the target data when the anonymization process is performed on the target data. Specifically, the usefulness evaluation unit 13 compares the number of records before anonymization of the target data with the number of records after anonymization, and evaluates how many records have been deleted by the anonymization.

［装置動作］
次に、本実施の形態における評価装置１０の動作の一例について図３を用いて説明する。図３は、本発明の実施の形態における評価装置の動作を示すフロー図である。また、以下の説明においては、適宜図１および図２を参酌する。また、本実施の形態では、評価装置１０を動作させることによって、評価方法が実施される。よって、本実施の形態における評価方法の説明は、以下の評価装置１０の動作説明に代える。 [Device operation]
Next, an example of the operation of the evaluation device 10 according to the present embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the evaluation device according to the embodiment of the present invention. Further, in the following description, FIGS. 1 and 2 will be referred to as appropriate. Further, in the present embodiment, the evaluation method is implemented by operating the evaluation device 10. Therefore, the description of the evaluation method according to the present embodiment will be replaced with the following description of the operation of the evaluation device 10.

図３に示すように、最初に、データ取得部１４は、データベース２０から、対象データとして個人情報を取得する（ステップＡ１）。また、データ取得部１４は、取得した対象データを、個人識別リスク評価部１１、個人特定リスク評価部１２、及び有用性評価部１３に入力する。 As shown in FIG. 3, first, the data acquisition unit 14 acquires personal information as target data from the database 20 (step A1). Further, the data acquisition unit 14 inputs the acquired target data to the individual identification risk evaluation unit 11, the individual identification risk evaluation unit 12, and the usefulness evaluation unit 13.

次に、個人識別リスク評価部１１は、ステップＡ１で取得された対象データから、個人情報の取得元の個人の存在が把握される可能性を、個人識別リスクとして評価する（ステップＡ２）。また、個人識別リスク評価部１１は、結果をデータ出力部１５に入力する。 Next, the personal identification risk evaluation unit 11 evaluates, as the personal identification risk, the possibility that the existence of the individual from which the personal information is acquired is grasped from the target data acquired in step A1 (step A2). The personal identification risk evaluation unit 11 also inputs the result to the data output unit 15.

次に、個人特定リスク評価部１２は、対象データから個人情報の取得元の個人が特定される可能性を、個人特定リスクとして評価する（ステップＡ３）。また、個人特定リスク評価部１２も、結果をデータ出力部１５に入力する。 Next, the individual identification risk evaluation unit 12 evaluates the possibility that the individual from which the personal information is acquired from the target data is identified as the individual identification risk (step A3). The individual identification risk evaluation unit 12 also inputs the result to the data output unit 15.

次に、有用性評価部１３は、対象データに対して匿名化処理が行なわれた場合の有用性を評価する（ステップＡ４）。また、有用性評価部１３も、結果をデータ出力部１５に入力する。 Next, the usefulness evaluation unit 13 evaluates the usefulness when the anonymization process is performed on the target data (step A4). The usefulness evaluation unit 13 also inputs the result to the data output unit 15.

次に、データ出力部１５は、ステップＡ２からステップＡ４で行なわれた評価を端末措置３０に出力する（ステップＡ５）。 Next, the data output unit 15 outputs the evaluation performed in steps A2 to A4 to the terminal measure 30 (step A5).

続いて、上述したステップＡ２〜Ａ４それぞれについて、以下により詳細に説明する。 Subsequently, each of the above-described steps A2 to A4 will be described in more detail below.

［ステップＡ２：個人識別リスク評価処理］
最初に、図４〜図７を用いて、個人識別リスク評価部１１による個人識別リスクの評価処理について説明する。図４は、本発明の実施形態において得られた個人識別リスクの評価の一例を示す図である。図５は、本発明の実施形態で行なわれる個人識別リスクの評価処理の一例を説明する図であり、図５（ａ）〜（ｃ）は一連の処理の流れを示している。図６は、本発明の実施形態で行なわれる個人識別リスクの評価処理の他の例を説明する図であり、図６（ａ）〜（ｄ）は一連の処理の流れを示している。図７は、本発明の実施形態で行なわれる個人識別リスクの評価処理の他の例を説明する図であり、図７（ａ）〜（ｅ）は一連の処理の流れを示している。 [Step A2: Individual identification risk evaluation process]
First, the individual identification risk evaluation processing by the individual identification risk evaluation unit 11 will be described with reference to FIGS. 4 to 7. FIG. 4 is a diagram showing an example of the evaluation of the personal identification risk obtained in the embodiment of the present invention. FIG. 5 is a diagram illustrating an example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 5A to 5C show a series of process flows. FIG. 6 is a diagram illustrating another example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 6A to 6D show a series of processes. FIG. 7 is a diagram illustrating another example of the personal identification risk evaluation process performed in the embodiment of the present invention, and FIGS. 7A to 7E show a series of processes.

図４に示すように、本実施の形態では、個人識別リスク評価部１１は、個人情報が記録されたテーブルの特定の準識別子の部分におけるｋ匿名化処理されて得られたデータが、対象データとなっている。つまり、図４において、一行目の行は、「匿名化後テーブル＿パターン１」の「年齢」及び「性別」のデータに対して、ｋ＝１０でｋ匿名化処理を行なうことで得られた匿名化後データが、対象データであることを示している。 As shown in FIG. 4, in the present embodiment, the personal identification risk evaluation unit 11 determines that the data obtained by the k-anonymization process in the specific quasi-identifier portion of the table in which the personal information is recorded is the target data. Has become. That is, in FIG. 4, the first row is obtained by performing the k-anonymization process with k=10 on the data of “age” and “sex” of “post-anonymization table_pattern 1”. It indicates that the data after anonymization is the target data.

また、個人識別リスク評価部１１は、対象データ中の特定の準識別子の値が一致するレコードの個数を算出し、算出した個数から、対象データに対してｋ匿名化処理を実行した場合のｋ人に識別される人数を求める。図４に示すように、この「人数」が、個人識別リスクの評価となる。 Further, the personal identification risk evaluation unit 11 calculates the number of records in which the value of the specific quasi-identifier in the target data matches, and from the calculated number, k when anonymization processing is performed on the target data. Find the number of people identified by a person. As shown in FIG. 4, this "number of people" is an evaluation of the personal identification risk.

具体的には、例えば、対象データにおいて、特定の準識別子が、単一値の組み合わせであるとする。「単一値」は、単一属性のデータであり、年齢、性別といった個人が単一の値しか持たない準識別子である。この場合、図５に示すように、個人識別リスク評価部１１は、対象データ（図５（ａ）参照）から、識別子Ｑｌ１及びＱｌ２の組み合わせ毎に、該当するユーザの数（レコード数）を求める（図５（ｂ）参照）。次に、個人識別リスク評価部１１は、ｋの値毎に、ｋ人に識別されるユーザの人数を求める（図５（ｃ）参照）。 Specifically, for example, in the target data, the specific quasi-identifier is a combination of single values. The “single value” is data having a single attribute, and is a quasi-identifier in which an individual such as age and sex has only a single value. In this case, as shown in FIG. 5, the personal identification risk evaluation unit 11 obtains the number of corresponding users (the number of records) from the target data (see FIG. 5A) for each combination of the identifiers Ql1 and Ql2. (See FIG. 5(b)). Next, the individual identification risk evaluation unit 11 obtains the number of users identified by k for each value of k (see FIG. 5C).

また、例えば、対象データにおいて、特定の識別子が集合値であるとする。「集合値」は、複合属性のデータであり、病気の種類、地域（職場の場所、居住地）、といった個人が複数の値を持つ可能性がある準識別子である。この場合、図６に示すように、個人識別リスク評価部１１は、対象データ（図６（ａ）参照）から、ユーザ毎に、該当する識別子「地域」の組み合わせを特定する（図６（ｂ）参照）。次に、個人識別リスク評価部１１は、識別子「地域」の組み合わせ毎に、該当するユーザの数（レコード数）を求める（図６（ｃ）参照）。次に、個人識別リスク評価部１１は、ｋの値毎に、ｋ人に識別されるユーザの人数を求める（図６（ｄ）参照）。 Further, for example, in the target data, it is assumed that the specific identifier is a set value. The “aggregate value” is data of complex attributes, and is a quasi-identifier in which an individual may have a plurality of values such as the type of illness and area (work place, place of residence). In this case, as shown in FIG. 6, the personal identification risk evaluation unit 11 specifies the combination of the corresponding identifiers “region” for each user from the target data (see FIG. 6A) (FIG. 6B. )reference). Next, the personal identification risk evaluation unit 11 obtains the number of corresponding users (the number of records) for each combination of the identifiers “region” (see FIG. 6C). Next, the individual identification risk evaluation unit 11 obtains the number of users identified by k for each value of k (see FIG. 6D).

また、例えば、対象データにおいて、特定の識別子が単一値と集合値との組合せであるとする。この場合、図７に示すように、個人識別リスク評価部１１は、まず、対象データの集合値の部分（図７（ａ）参照）と、対象データの単一値の部分（図７（ｂ）参照）とを結合する（図７（ｃ）参照）。次に、個人識別リスク評価部１１は、識別子ｑｉ１と識別子「地域」との組み合わせ毎に、該当するユーザの数（レコード数）を求める（図７（ｃ）参照）。次に、個人識別リスク評価部１１は、ｋの値毎に、ｋ人に識別されるユーザの人数を求める（図７（ｄ）参照）。 Further, for example, in the target data, the specific identifier is a combination of a single value and a set value. In this case, as shown in FIG. 7, the personal identification risk evaluation unit 11 firstly sets the part of the set value of the target data (see FIG. 7A) and the part of the single value of the target data (see FIG. 7B). ))) and (see FIG. 7C). Next, the individual identification risk evaluation unit 11 obtains the number of corresponding users (the number of records) for each combination of the identifier qi1 and the identifier “region” (see FIG. 7C). Next, the individual identification risk evaluation unit 11 obtains the number of users identified by k for each value of k (see FIG. 7D).

［ステップＡ３：個人特定リスク評価処理］
続いて、図８を用いて、個人特定リスク評価部１２による個人特定リスクの評価処理について説明する。図８は、本発明の実施の形態において行なわれる個人特定リスクの評価処理を説明するための図であり、個人情報の一例を示している。 [Step A3: Individual identification risk evaluation process]
Next, with reference to FIG. 8, the individual identification risk evaluation processing by the individual identification risk evaluation unit 12 will be described. FIG. 8 is a diagram for explaining an individual identification risk evaluation process performed in the embodiment of the present invention, and shows an example of personal information.

個人特定リスク評価部１２は、個人識別リスク評価部１１によって得られた個人識別リスクと、準識別子毎に予め設定された、各準識別子から個人が特定される危険性を示す係数とに基づいて、個人特定リスクを評価する。具体的には、個人特定リスク評価部１２は、例えば、下記の数１に示す式によって、個人特定リスクｆを算出する。 The individual identification risk evaluation unit 12 is based on the individual identification risk obtained by the individual identification risk evaluation unit 11 and a coefficient that is preset for each quasi-identifier and indicates a risk of identifying an individual from each quasi-identifier. , Assess personal identification risk. Specifically, the individual identification risk evaluation unit 12 calculates the individual identification risk f by, for example, the formula shown below.

（数１）
ｆ＝（１／ｋ）×ｒ (Equation 1)
f=(1/k)×r

上記数１において、ｋは、個人識別リスク評価部１１によって算出された個人識別リスクである。ｒは、各準識別子から個人が特定される危険性を示す係数である。以下、ｒを「評価パラメータ」と表記する。 In the above mathematical expression 1, k is the personal identification risk calculated by the personal identification risk evaluation unit 11. r is a coefficient indicating the risk of identifying an individual from each quasi-identifier. Hereinafter, r is referred to as “evaluation parameter”.

評価パラメータは、準識別子の内容、準識別子の値、対象データの利用先等に基づいて、適宜設定される。例えば、個人情報が図５に示すものであるとする。 The evaluation parameter is appropriately set based on the content of the quasi-identifier, the value of the quasi-identifier, the usage destination of the target data, and the like. For example, assume that the personal information is as shown in FIG.

図８の例では、準識別子である、年齢、性別、診療年月、傷病は、外部の別のデータでも使用されている可能性があり、このうち、年齢及び性別は、傷病よりも別のデータに存在する可能性が高いと考えられる。従って、対象データにおける特定の識別子が傷病の場合は、特定の識別子が年齢及び性別の場合に比べて、個人が特定される可能性は低くなるので、評価パラメータｒの値も小さく設定される。 In the example of FIG. 8, the quasi-identifiers age, sex, medical treatment date, and injury/disease may be used in other external data as well, of which age and sex are different from injury/illness. It is highly likely that it exists in the data. Therefore, when the specific identifier in the target data is injury or illness, the possibility of identifying the individual is lower than in the case where the specific identifier is age or sex, and thus the value of the evaluation parameter r is set small.

また、対象データデータにおける特定の識別子が傷病である場合において、心臓病は、風邪よりも別のデータに存在する可能性が低いと考える。従って、対象データにおける特定の識別子が傷病である場合であっても、風邪のレコード数が比較的多い場合は、評価パラメータｒの値は大きく設定され、風邪のレコード数が比較的少ない場合は、評価パラメータｒの値は小さく設定される。 Moreover, when the specific identifier in the target data data is injury or illness, heart disease is less likely to be present in another data than cold. Therefore, even when the specific identifier in the target data is injury or illness, the value of the evaluation parameter r is set to a large value when the number of cold records is relatively large, and when the number of cold records is relatively small, The value of the evaluation parameter r is set small.

また、個人特定リスク評価部１２は、下記の数２に示す式によって、個人特定リスクｆを算出することもできる。 Further, the individual identification risk evaluation unit 12 can also calculate the individual identification risk f by the formula shown in the following Expression 2.

（数２）
ｆ＝ｋ×Ｒ (Equation 2)
f=k×R

上記数２において、ｋは、数１と同様に、個人識別リスク評価部１１によって算出された個人識別リスクである。Ｒは、共有率である。共有率は、対象となる識別子を攻撃者がどのくらいの確率で事前知識として知っているかの可能性を示しており、０以上１以下の範囲で設定される。例えば、識別子が性別のみの場合は、知られやすいので０．９に設定され、年齢のみの場合は０．７に設定される。また、識別子が年齢と性別との組み合わせの場合は、多少知られにくくなるので、例えば、０．５に設定される。 In Expression 2, k is the individual identification risk calculated by the individual identification risk evaluation unit 11, as in Expression 1. R is the sharing rate. The sharing rate indicates the probability with which the attacker knows the target identifier as prior knowledge, and is set in the range of 0 or more and 1 or less. For example, when the identifier is only gender, it is easy to be known, so it is set to 0.9, and when it is only age, it is set to 0.7. Further, when the identifier is a combination of age and gender, it is somewhat difficult to be known, so it is set to 0.5, for example.

［ステップＡ４：有用性評価処理］
次に、図９を用いて、有用性評価部１３による有用性の評価処理について説明する。図９は、本発明の実施の形態で行なわれた有用性の評価の一例を示す図である。図９において、横軸は、ｋ匿名化処理におけるｋの値を示し、縦軸は、ｋ匿名化処理の前後におけるレコードの減少率を示している。 [Step A4: Usability Evaluation Processing]
Next, the utility evaluation processing by the utility evaluation unit 13 will be described with reference to FIG. FIG. 9 is a diagram showing an example of evaluation of usefulness performed in the embodiment of the present invention. In FIG. 9, the horizontal axis represents the value of k in the k anonymization process, and the vertical axis represents the record reduction rate before and after the k anonymization process.

図９に示すように、有用性評価部１３は、対象データのレコード数と、対象データに対してｋ匿名化処理が行なわれた場合の対象データのレコード数とを用いて、有用性を評価している。具体的には、有用性評価部１３は、ｋの値を変えて、対象データの匿名化前のレコード数に対する、ｋ匿名化後のレコード数の割合（減少率）を算出し、算出結果をグラフ化する。 As shown in FIG. 9, the usefulness evaluation unit 13 evaluates the usefulness by using the number of records of the target data and the number of records of the target data when k anonymization processing is performed on the target data. doing. Specifically, the usefulness evaluation unit 13 changes the value of k, calculates the ratio (decrease rate) of the number of records after k anonymization to the number of records before anonymization of the target data, and calculates the calculation result. Make a graph.

このように、図９に示すグラフによれば、評価者は、ｋ匿名化処理によってどの程度のレコードが削除されるのかを視覚で把握することができる。この結果、評価者は、ｋ匿名化後の評価対象データの有用性を把握できる。 Thus, according to the graph shown in FIG. 9, the evaluator can visually understand how many records are deleted by the k-anonymization process. As a result, the evaluator can grasp the usefulness of the evaluation target data after the anonymization of k.

［実施の形態における効果］
以上のように、本実施の形態よれば、個人特定リスク、個人識別リスク、有用性を評価するツールを提供でき、リスクと有用性とのバランスのとれたガイドライン策定に貢献することが可能となる。 [Effects of Embodiment]
As described above, according to the present embodiment, it is possible to provide a tool for evaluating individual identification risk, individual identification risk, and usefulness, and it is possible to contribute to the formulation of guidelines that balance risk and usefulness. ..

［プログラム］
本実施の形態におけるプログラムは、コンピュータに、図３に示すステップＡ１〜Ａ５を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態における評価装置１０と評価方法とを実現することができる。この場合、コンピュータのＣＰＵ（Central Processing Unit）は、個人識別リスク評価部１１、個人特定リスク評価部１２、有用性評価部１３、データ取得部１４、及びデータ出力部１５として機能し、処理を行なう。 [program]
The program in the present embodiment may be any program that causes a computer to execute steps A1 to A5 shown in FIG. The evaluation device 10 and the evaluation method according to the present embodiment can be realized by installing and executing this program on a computer. In this case, the CPU (Central Processing Unit) of the computer functions as an individual identification risk evaluation unit 11, an individual identification risk evaluation unit 12, a usefulness evaluation unit 13, a data acquisition unit 14, and a data output unit 15 to perform processing. ..

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、個人識別リスク評価部１１、個人特定リスク評価部１２、有用性評価部１３、データ取得部１４、及びデータ出力部１５のいずれかとして機能しても良い。 Further, the program according to the present embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as any one of the personal identification risk evaluation unit 11, the individual identification risk evaluation unit 12, the usefulness evaluation unit 13, the data acquisition unit 14, and the data output unit 15. good.

ここで、本実施の形態におけるプログラムを実行することによって、評価装置１０を実現するコンピュータについて図１０を用いて説明する。図１０は、本発明の実施の形態における評価装置を実現するコンピュータの一例を示すブロック図である。 Here, a computer that realizes the evaluation device 10 by executing the program according to the present embodiment will be described with reference to FIG. 10. FIG. 10 is a block diagram showing an example of a computer that realizes the evaluation device according to the embodiment of the present invention.

図１０に示すように、コンピュータ１１０は、ＣＰＵ１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。 As shown in FIG. 10, the computer 110 includes a CPU 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader/writer 116, and a communication interface 117. These units are connected to each other via a bus 121 so as to be able to perform data communication with each other.

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 expands the program (code) according to the present embodiment stored in the storage device 113 into the main memory 112, and executes these in a predetermined order to perform various calculations. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Further, the program in the present embodiment is provided in a state of being stored in computer-readable recording medium 120. The program according to the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, as a specific example of the storage device 113, a semiconductor storage device such as a flash memory can be cited in addition to a hard disk drive. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and a mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader/writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads a program from the recording medium 120, and writes the processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記憶媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記憶媒体が挙げられる。 As a specific example of the recording medium 120, a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic storage medium such as a flexible disk, or a CD- An optical storage medium such as a ROM (Compact Disk Read Only Memory) can be used.

また、本実施の形態における評価装置１０は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、評価装置１０は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 The evaluation device 10 according to the present embodiment can also be realized by using hardware corresponding to each unit instead of using a computer in which a program is installed. Further, the evaluation device 10 may be partially implemented by a program and the rest may be implemented by hardware.

以上のように、本発明によれば、ｋ匿名化処理を行なったデータに対する評価を行なうことができる。本発明は、個人情報の匿名化求められる種々の分野において有用である。 As described above, according to the present invention, it is possible to evaluate the data subjected to the k-anonymization process. The present invention is useful in various fields in which anonymization of personal information is required.

１０評価装置
１１個人識別リスク評価部
１２個人特定リスク評価部
１３有用性評価部
１４データ取得部
１５データ出力部
２０データベース
３０端末装置
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス 10 Evaluation device 11 Individual identification risk evaluation part 12 Individual identification risk evaluation part 13 Utility evaluation part 14 Data acquisition part 15 Data output part 20 Database 30 Terminal device 110 Computer 111 CPU
112 Main Memory 113 Storage Device 114 Input Interface 115 Display Controller 116 Data Reader/Writer 117 Communication Interface 118 Input Equipment 119 Display Device 120 Recording Medium 121 Bus

Claims

An evaluation device for evaluating data including personal information acquired from a plurality of individuals as target data,
The number of records in which the values of the quasi-identifiers forming the personal information match is calculated, and from the calculated number, the number of people identified by k when the anonymization process is performed on the target data is obtained . An individual identification risk evaluation unit that evaluates the calculated number of people as an individual identification risk,
The possibility that the individual from whom the personal information is acquired is identified from the target data, is evaluated as an individual identification risk, and an individual identification risk evaluation unit,
A usefulness evaluation unit that evaluates usefulness when anonymization processing is performed on the target data,
An evaluation device comprising:

The individual identification risk evaluation unit evaluates the individual identification risk based on the individual identification risk and a coefficient that is preset for each of the quasi-identifiers and indicates a risk that an individual is identified from the quasi-identifier.
The evaluation device according to claim 1 .

The usefulness evaluation unit evaluates the usefulness by using the number of records of the target data and the number of records of the target data when anonymization processing is performed on the target data,
The evaluation device according to claim 1 or 2 .

An evaluation method in which a computer evaluates data including personal information acquired from a plurality of individuals as target data,
(A) The number of records in which the values of the quasi-identifiers that form the personal information match, and the number of records identified by k when the anonymization process is performed on the target data from the calculated number And evaluate the calculated number of people as a personal identification risk, and
(B) evaluating the possibility that an individual from whom the personal information is acquired from the target data is identified as an individual identification risk;
(C) a step of evaluating the usefulness when anonymization processing is performed on the target data, and
An evaluation method comprising:

In the step (b), the individual identification risk is evaluated based on the individual identification risk and a coefficient that is preset for each of the quasi-identifiers and indicates a risk of identifying an individual from the quasi-identifier. ,
The evaluation method according to claim 4 .

In the step (c), the usefulness is evaluated using the number of records of the target data and the number of records of the target data when anonymization processing is performed on the target data.
The evaluation method according to claim 4 or 5 .

A program for evaluating data including personal information obtained from a plurality of individuals as target data by a computer,
On the computer,
(A) The number of records in which the values of the quasi-identifiers that form the personal information match, and the number of records identified by k when the anonymization process is performed on the target data from the calculated number And evaluate the calculated number of people as a personal identification risk, and
(B) evaluating the possibility that an individual from whom the personal information is acquired from the target data is identified as an individual identification risk;
(C) a step of evaluating the usefulness when anonymization processing is performed on the target data, and
A program that runs

In the step (b), the individual identification risk is evaluated based on the individual identification risk and a coefficient that is preset for each of the quasi-identifiers and indicates a risk of identifying an individual from the quasi-identifier. ,
The program according to claim 7 .

In the step (c), the usefulness is evaluated using the number of records of the target data and the number of records of the target data when anonymization processing is performed on the target data.
The program according to claim 7 .