JP6078437B2

JP6078437B2 - Personal information anonymization system

Info

Publication number: JP6078437B2
Application number: JP2013177035A
Authority: JP
Inventors: 和明井堀; 中村　雄一; 雄一中村
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2013-08-28
Filing date: 2013-08-28
Publication date: 2017-02-08
Anticipated expiration: 2033-08-28
Also published as: JP2015046030A

Description

本発明は、パーソナル情報を匿名化するパーソナル情報匿名化システムに関する。 The present invention relates to a personal information anonymization system that anonymizes personal information.

クラウドコンピューティングや分散処理技術の発展により、これまで企業や団体で蓄積されてきた大量のデータの分析や利活用が現実的な時間で可能となってきている。一方で、これらのデータは多くの場合、氏名や住所のような個人を特定できる情報や、年齢・職業・学歴といった必ずしも個人を特定できないが、個人に関する情報を含んでいる。このような情報はパーソナル情報と呼ばれる。パーソナル情報の利活用は、プライバシー侵害を防止するため、プライバシー保護に配慮して行う必要がある。 With the development of cloud computing and distributed processing technology, it has become possible to analyze and utilize a large amount of data accumulated by companies and organizations so far in a realistic time. On the other hand, in many cases, these data include information that can identify an individual such as a name and an address, and information about an individual that cannot necessarily identify an individual such as age, occupation, and educational background. Such information is called personal information. Utilization of personal information must be done with consideration for privacy protection in order to prevent privacy infringement.

このプライバシーの保護は、個人情報保護法で定める個人情報を削除するだけでは実現できない。例えば、パーソナル情報から、識別番号のようなそれ自体で個人を特定できる情報を削除したとしても、年齢・国籍・職業といった、それ単独では個人を特定できない属性を組み合わせることで個人を特定できることがある。このような、組み合わせることで個人の特定が可能になる属性は準識別子と呼ばれる。準識別子の組み合わせによって個人を特定することを防ぐ技術として、ｋ匿名化がある。ｋ匿名化は、パーソナル情報の準識別子の組み合わせが文書中に所定の匿名化閾値ｋ件以上現れるように属性値を曖昧化する技術である。例えば、ｋ匿名化では、「31歳・日本人・システムエンジニア」という属性値の組み合わせを「30代・日本人・ＩＴ」というように曖昧化する。 This privacy protection cannot be realized simply by deleting personal information defined by the Personal Information Protection Law. For example, even if information that can identify an individual such as an identification number is deleted from personal information, an individual may be identified by combining attributes such as age, nationality, and occupation that cannot be identified by itself. . Such attributes that can be identified by combining them are called quasi-identifiers. As a technique for preventing an individual from being identified by a combination of quasi-identifiers, there is k anonymization. k anonymization is a technique for obscure attribute values so that a combination of quasi-identifiers of personal information appears in a document over a predetermined anonymization threshold value k. For example, in k-anonymization, a combination of attribute values “31 years old, Japanese, system engineer” is obscured as “30s, Japanese, IT”.

パーソナル情報の匿名化技術であるｋ匿名化では、パーソナル情報の準識別子の属性値を曖昧化するために、一般化階層というデータ構造を使用する。特許文献１に示す技術を用いて、パーソナル情報の準識別子の属性値の出現頻度に従ってこの一般化階層を生成することができる。このようにして得られる一般化階層を用いてｋ匿名化を実施することにより、情報の精度落ちを抑えることができる。 In k anonymization that is an anonymization technique of personal information, a data structure called a generalized hierarchy is used to obscure the attribute value of the quasi-identifier of personal information. This generalized hierarchy can be generated according to the appearance frequency of the attribute value of the quasi-identifier of personal information using the technique shown in Patent Document 1. By performing k anonymization using the generalized hierarchy obtained in this way, it is possible to suppress the loss of accuracy of information.

一般化階層はルートを頂点とする木構造のノードで構成されており、いくつかの層に分かれている。このような一般化階層を構成する層をレイヤと呼ぶ。
パーソナル情報とこのパーソナル情報の準識別子とこの準識別子の一般化階層のレイヤの組み合わせとを決めると、その組み合わせに従ってパーソナル情報を匿名化することができる。この準識別子の一般化階層のレイヤの組み合わせを匿名化プランと呼ぶ。パーソナル情報とその準識別子の一般化階層を与え、ｋ匿名化閾値を決めたとき、このｋ匿名化閾値を満たすことのできる匿名化プランは一般に複数ある。したがって、その中から最適な匿名化プランを選択する必要がある。匿名化プランの選択については、非特許文献１に示す方法が知られている。この方法では、指定したｋ匿名化閾値との近さや一般化階層のレイヤの低さなどに基づいて匿名化プランを選択している。 The generalized hierarchy is composed of tree-structured nodes with the root at the top, and is divided into several layers. Such a layer constituting the generalized hierarchy is called a layer.
When the personal information, the quasi-identifier of the personal information, and the combination of the layers of the generalized hierarchy of the quasi-identifier are determined, the personal information can be anonymized according to the combination. This combination of the layers of the generalized hierarchy of quasi-identifiers is called an anonymization plan. When a generalized hierarchy of personal information and its quasi-identifier is given and a k-anonymization threshold is determined, there are generally a plurality of anonymization plans that can satisfy this k-anonymization threshold. Therefore, it is necessary to select an optimal anonymization plan from among them. Regarding the selection of the anonymization plan, the method shown in Non-Patent Document 1 is known. In this method, the anonymization plan is selected based on the proximity to the designated k anonymization threshold, the low layer of the generalized hierarchy, and the like.

ｋ匿名化によって得られる匿名化データの品質は一般化階層や匿名化プランに依存するため、一般化階層の編集による影響を把握できるようにする必要がある。この匿名化データの品質を表す指標として損失情報量がある。損失情報量を計算する技術はいくつか提案されているが、属性値の出現頻度を基にした情報エントロピーとして損失情報量を計算する技術が、同じく特許文献１で提案されている。この技術には、匿名化対象のパーソナル情報に即した情報量の評価ができるという特長がある。 Since the quality of the anonymized data obtained by k anonymization depends on the generalization hierarchy and the anonymization plan, it is necessary to be able to grasp the influence by the editing of the generalization hierarchy. There is a loss information amount as an index representing the quality of the anonymized data. Several techniques for calculating the loss information amount have been proposed, but a technique for calculating the loss information amount as information entropy based on the appearance frequency of the attribute value is also proposed in Patent Document 1. This technology has the feature that the amount of information can be evaluated in accordance with personal information to be anonymized.

国際公開ＷＯ２０１１／１４５４０１号パンフレットInternational publication WO2011 / 145401 pamphlet

（株）日立コンサルティング：「平成２１年度経済産業省情報大航海プロジェクト（基盤共通技術の開発・改良と検証）パーソナル情報保護・解析基盤の開発・完了と検証個人情報匿名化基盤外部仕様書」ｐ．５３（２０１０年０３月）Hitachi Consulting Co., Ltd .: 2009 Ministry of Economy, Trade and Industry Information Grand Voyage Project (Development, improvement and verification of common infrastructure technology) Development, completion and verification of personal information protection / analysis infrastructure External specification of personal information anonymization platform p . 53 (March 2010)

特許文献１に示す技術を用いて得られる一般化階層は、情報の精度落ちを抑えることができる反面、必ずしも分析目的に適したものになるとは限らない。例えば、「仏（フランス）」と「中（中国）」という国籍の準識別子の属性値が１つ上のレイヤで「仏中（フランスおよび中国）」という属性値に一般化された場合、この一般化は、匿名化データを用いた分析において、国籍が表す国の世界的な地域区分を重視する場合には適切ではない。
一方で、利用者が特許文献１に示すような技術を用いず、自らの手で一般化階層を作成した場合、その一般化階層を用いて匿名化を一度実行しないと情報がどの程度落ちるのかが分からないため、分析目的に適した匿名化データが得られるとは限らない。 The generalized hierarchy obtained by using the technique shown in Patent Document 1 can suppress the loss of information accuracy, but is not always suitable for analysis purposes. For example, if the attribute values of the quasi-identifiers of nationalities “French (France)” and “China (China)” are generalized to the attribute value “French China (France and China)” in the next higher layer, Generalization is not appropriate when the analysis using anonymized data places importance on the global regional division of the country represented by nationality.
On the other hand, if the user does not use the technique shown in Patent Document 1 and creates a generalized hierarchy by one's own hand, how much information will be lost if anonymization is not performed once using the generalized hierarchy Therefore, anonymized data suitable for analysis purposes is not always obtained.

本発明は、分析目的に適した匿名化データを得るために一般化階層を編集することができるパーソナル情報匿名化システムを提供することを目的とする。 An object of the present invention is to provide a personal information anonymization system capable of editing a generalized hierarchy in order to obtain anonymized data suitable for analysis purposes.

上記課題を解決するために、本発明のパーソナル情報匿名化システムは、
匿名化される対象である複数の準識別子の属性値の組み合わせを含む匿名化対象データに基づいて、当該準識別子毎に、前記匿名化対象データに含まれる属性値および当該属性値の出現頻度を有する複数のノードを含む最下層のレイヤと、下層のレイヤに比べて匿名化の程度が同一またはより高い属性値および当該属性値の出現頻度を有する１つ以上のノードを含むレイヤとによって構成される木構造の一般化階層を作成する一般化階層作成手段と、
利用者による準識別子の指定に応答して、指定された当該準識別子の一般化階層を利用者に提示する一般化階層提示手段と、
利用者による編集の指示に応じて、前記一般化階層提示手段によって提示された一般化階層を、木構造を保ちながら更新し、更新された一般化階層を利用者に再提示する一般化階層編集手段と、
利用者による各準識別子の一般化階層のレイヤの指定に応答して、前記匿名化対象データに含まれる各準識別子の属性値が当該指定された各準識別子の一般化階層のレイヤに属するノードの属性値に置き換えられた匿名化データを作成する匿名化手段と、
を備えることを特徴とする。 In order to solve the above problems, the personal information anonymization system of the present invention is:
Based on the anonymization target data including a combination of attribute values of a plurality of quasi-identifiers to be anonymized, for each quasi-identifier, the attribute value included in the anonymization target data and the appearance frequency of the attribute value A lowermost layer including a plurality of nodes, and a layer including one or more nodes having an attribute value having the same or higher degree of anonymization as compared to the lower layer and an appearance frequency of the attribute value. A generalized hierarchy creating means for creating a generalized hierarchy of a tree structure,
In response to the designation of the quasi-identifier by the user, a generalized hierarchy presentation means for presenting the generalized hierarchy of the designated quasi-identifier to the user;
Generalized hierarchy editing in which the generalized hierarchy presented by the generalized hierarchy presenting means is updated while maintaining a tree structure in accordance with an editing instruction by the user, and the updated generalized hierarchy is re-presented to the user. Means,
In response to designation of the generalized hierarchy layer of each semi-identifier by the user, the attribute value of each semi-identifier included in the anonymization target data belongs to the general hierarchy layer of the designated semi-identifier Anonymization means for creating anonymized data replaced with attribute values of
It is characterized by providing.

好ましくは、本発明のパーソナル情報匿名化システムは、
前記一般化階層の各レイヤの情報損失指標を求める損失指標算出手段を備え、
前記一般化階層提示手段と前記一般化階層編集手段が、前記損失指標算出手段によって求められた前記一般化階層に含まれる各レイヤの情報損失指標を利用者に提示する、
ことを特徴とする。 Preferably, the personal information anonymization system of the present invention is
A loss index calculating means for determining an information loss index of each layer of the generalized hierarchy;
The generalized hierarchy presenting means and the generalized hierarchy editing means present the information loss index of each layer included in the generalized hierarchy obtained by the loss index calculating means to the user;
It is characterized by that.

好ましくは、本発明のパーソナル情報匿名化システムは、
前記匿名化手段が、前記匿名化データに含まれる各準識別子の属性値の組み合わせの出現頻度が利用者によって指定されたｋ匿名化閾値を満たすように前記匿名化データを修正することを特徴とする請求項１または２に記載のパーソナル情報匿名化システム。 Preferably, the personal information anonymization system of the present invention is
The anonymization means modifies the anonymization data so that an appearance frequency of a combination of attribute values of each quasi-identifier included in the anonymization data satisfies a k anonymization threshold specified by a user. The personal information anonymization system according to claim 1 or 2.

好ましくは、本発明のパーソナル情報匿名化システムは、
前記一般化階層編集手段における編集が、ノードの名前の変更と、ノードの移動と、レイヤの追加と、レイヤの削除とを含むことを特徴とする。 Preferably, the personal information anonymization system of the present invention is
The editing in the generalized hierarchy editing means includes a node name change, a node movement, a layer addition, and a layer deletion.

本発明によれば、分析目的に適した匿名化データを得るために一般化階層を編集することができる。 According to the present invention, the generalized hierarchy can be edited in order to obtain anonymized data suitable for analysis purposes.

本発明の実施形態に係るパーソナル情報匿名化システムの構成の一例を示す図である。It is a figure which shows an example of a structure of the personal information anonymization system which concerns on embodiment of this invention. 一般化階層編集画面の一例を示す図である。It is a figure which shows an example of the generalization hierarchy edit screen. 匿名化対象データ取込ダイアログの画面の一例を示す図である。It is a figure which shows an example of the screen of an anonymization object data taking-in dialog. 匿名化実行ダイアログの画面の一例を示す図である。It is a figure which shows an example of the screen of an anonymization execution dialog. 一般化階層エディタ上に表示されるコンテキストメニューで選択可能な項目の一例を示す図である。It is a figure which shows an example of the item which can be selected by the context menu displayed on a generalization hierarchy editor. 「ノード名変更」の一例を示す図である。It is a figure which shows an example of "a node name change." 「ノード移動」の第１の例を示す図である。It is a figure which shows the 1st example of "node movement." 「ノード移動」の第２の例を示す図である。It is a figure which shows the 2nd example of "node movement." 「ノード移動」の第３の例を示す図である。It is a figure which shows the 3rd example of "node movement." 匿名化対象データの構成の一例を示す図である。It is a figure which shows an example of a structure of the anonymization object data. 管理データの構成の一例を示す図である。It is a figure which shows an example of a structure of management data. 一般化階層データの構成の一例を示す図である。It is a figure which shows an example of a structure of generalized hierarchy data. 準識別子タプル頻度データの構成の一例を示す図である。It is a figure which shows an example of a structure of semi-identifier tuple frequency data. 匿名化後頻度表算出部における匿名化後頻度表算出処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the frequency table calculation process after anonymization in the frequency table calculation part after anonymization. 一般化階層の一例を示す図である。図１５（Ａ）は準識別子「年齢」の一般化階層の一例を示す。図１５（Ｂ）は準識別子「国籍」の一般化階層の一例を示す。図１５（Ｃ）は準識別子「専攻」の一般化階層の一例を示す図である。It is a figure which shows an example of a generalization hierarchy. FIG. 15A shows an example of a generalized hierarchy of the quasi-identifier “age”. FIG. 15B shows an example of a generalized hierarchy of the quasi-identifier “nationality”. FIG. 15C is a diagram showing an example of a generalized hierarchy of the quasi-identifier “major”. 匿名化後タプル頻度表の一例を示す図である。It is a figure which shows an example of the tuple frequency table after anonymization. 損失指標算出部における損失指標算出処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the loss index calculation process in a loss index calculation part. 準識別子ごとの頻度表の一例を示す図である。図１８（Ａ）は準識別子「年齢」の頻度表の一例を示す。図１８（Ｂ）は準識別子「国籍」の頻度表の一例を示す。図１８（Ｃ）は準識別子「専攻」の頻度表の一例を示す。It is a figure which shows an example of the frequency table for every semi-identifier. FIG. 18A shows an example of a frequency table of the quasi-identifier “age”. FIG. 18B shows an example of a frequency table of the quasi-identifier “nationality”. FIG. 18C shows an example of a frequency table of the quasi-identifier “major”. 準識別子ごとの匿名化後頻度表の一例を示す図である。図１８（Ａ）は準識別子「年齢」の匿名化後頻度表の一例を示す。図１８（Ｂ）は準識別子「国籍」の匿名化後頻度表の一例を示す。図１８（Ｃ）は準識別子「専攻」の匿名化後頻度表の一例を示す。It is a figure which shows an example of the frequency table after anonymization for every semi-identifier. FIG. 18A shows an example of a frequency table after anonymization of the quasi-identifier “age”. FIG. 18B shows an example of a frequency table after anonymization of the quasi-identifier “nationality”. FIG. 18C shows an example of an anonymized frequency table of the semi-identifier “major”. 一般化階層作成処理の流れの一例を示す図である。It is a figure which shows an example of the flow of a generalization hierarchy preparation process. 一般化階層提示処理の流れの一例を示す図である。It is a figure which shows an example of the flow of a generalization hierarchy presentation process. 一般化階層編集処理の流れの一例を示す図である。It is a figure which shows an example of the flow of a generalization hierarchy edit process. 一般化階層保存部における一般化階層保存処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the generalization hierarchy preservation | save process in a generalization hierarchy preservation | save part. 匿名化処理部における匿名化処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the anonymization process in the anonymization process part.

以下、本発明の実施形態に係るパーソナル情報匿名化システムについて添付図面を参照しながら説明する。ただし、本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。
なお、実施形態を説明する全図において、共通の構成要素には同一の符号を付し、繰り返しの説明を省略する。 Hereinafter, a personal information anonymization system according to an embodiment of the present invention will be described with reference to the accompanying drawings. However, it should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention.
In all the drawings for explaining the embodiments, common constituent elements are denoted by the same reference numerals, and repeated explanation is omitted.

（全体構成）
図１は、本発明の実施形態に係るパーソナル情報匿名化システム１００の構成の一例を示す。
パーソナル情報匿名化システム１００は、インターネット等のネットワーク１８０に接続されている。また、ネットワーク１８０には、一般化階層編集受付装置１７０が接続されている。パーソナル情報匿名化システム１００と一般化階層編集受付装置１７０は、ネットワーク１８０を介して相互に通信することができる。
一般化階層編集受付装置１７０は、一般化階層を編集するユーザインタフェースを利用者に提供する。一般化階層編集受付装置１７０は、例えば、パーソナルコンピュータで実現される。一般化階層編集受付装置１７０は、例えば、Ｗｅｂブラウザに表示されるＷｅｂページとしてユーザインタフェースを実現してもよいし、Ｊａｖａ（登録商標）のグラフィカルユーザインタフェース部品であるＳＷＴ（ＳｔａｎｄａｒｄＷｉｄｇｅｔＴｏｏｌｋｉｔ）を用いたクライアントサーバアプリケーションのウィンドウとしてユーザインタフェースを実現してもよい。 (overall structure)
FIG. 1 shows an example of the configuration of a personal information anonymization system 100 according to an embodiment of the present invention.
The personal information anonymization system 100 is connected to a network 180 such as the Internet. The network 180 is connected to a generalized hierarchy editing accepting device 170. The personal information anonymization system 100 and the generalized hierarchy editing accepting device 170 can communicate with each other via the network 180.
The generalized hierarchy editing reception apparatus 170 provides a user with a user interface for editing the generalized hierarchy. The generalized hierarchy edit accepting apparatus 170 is realized by a personal computer, for example. For example, the generalized hierarchy editing reception apparatus 170 may implement a user interface as a Web page displayed on a Web browser, or use SWT (Standard Widget TOOLkit), which is a Java (registered trademark) graphical user interface component. The user interface may be realized as a window of the client server application.

パーソナル情報匿名化システム１００は、匿名化装置１０１とパーソナル情報管理装置１５０とを含む。
匿名化装置１０１は、アプリケーションサーバ等であって、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成される主メモリと、ハードディスク等で構成される記憶装置とを備える。
匿名化装置１０１の記憶装置には、匿名化プログラムが格納されている。
匿名化装置１０１のＣＰＵが、匿名化プログラムを記憶装置から主メモリに読み込んで実行することによって、ユーザインタフェース部１１０と、一般化階層作成部１１１と、一般化階層提示部１１２と、一般化階層編集部１１３と、匿名化後頻度表算出部１１４と、損失指標算出部１１５と、一般化階層保存部１１６と、匿名化処理部１１７との各部の機能が実現される。一時記憶部１２０は、匿名化装置１０１の主メモリまたは記憶装置に設けられた記憶領域である。
パーソナル情報管理装置１５０は、データベース製品上のストレージとして実現される。パーソナル情報管理装置１５０の実現方法は関係データベース、キーバリュー型分散データベースなどが考えられる。パーソナル情報管理装置１５０は、匿名化対象データ１５１と、管理データ１５２と、一般化階層データ１５３と、準識別子タプル頻度データ１５４と、匿名化データ１５５とを管理する。
なお、匿名化装置１０１とパーソナル情報管理装置１５０は、同一のコンピュータで実現されてもよいし、別々のコンピュータで実現されてもよい。 The personal information anonymization system 100 includes an anonymization device 101 and a personal information management device 150.
The anonymization device 101 is an application server or the like, and includes a CPU (Central Processing Unit), a main memory composed of RAM (Random Access Memory) and the like, and a storage device composed of a hard disk and the like.
The storage device of the anonymization device 101 stores an anonymization program.
The CPU of the anonymization device 101 reads the anonymization program from the storage device into the main memory and executes it, so that the user interface unit 110, the generalized hierarchy creation unit 111, the generalized hierarchy presentation unit 112, and the generalized hierarchy The functions of the editing unit 113, the post-anonymization frequency table calculation unit 114, the loss index calculation unit 115, the generalized hierarchy storage unit 116, and the anonymization processing unit 117 are realized. The temporary storage unit 120 is a storage area provided in the main memory or the storage device of the anonymization device 101.
The personal information management device 150 is realized as a storage on a database product. As a method for realizing the personal information management apparatus 150, a relational database, a key-value type distributed database, or the like can be considered. The personal information management device 150 manages anonymization target data 151, management data 152, generalized hierarchy data 153, semi-identifier tuple frequency data 154, and anonymization data 155.
The anonymization device 101 and the personal information management device 150 may be realized by the same computer or may be realized by separate computers.

ユーザインタフェース部１１０は、後述するように、図２に示す一般化階層編集画面２００を一般化階層編集受付装置１７０に表示させる。一般化階層編集受付装置１７０が一般化階層編集画面２００上で入力された各種情報を匿名化装置１０１に送信すると、ユーザインタフェース部１１０はその情報を受信して他の各部に渡す。
一般化階層作成部１１１は、匿名化対象データの名前、匿名化対象データが格納されているデータファイルの場所、ｋ匿名化閾値、および属性情報などの情報を受け付けると、データファイルを取得する。そして、一般化階層作成部１１１は、データファイルから匿名化対象データと一般化階層データと準識別子タプル頻度データとを作成し、パーソナル情報管理装置１５０に匿名化対象データ１５１と管理データ１５２と一般化階層データ１５３と準識別子タプル頻度データ１５４とを書き込む。
一般化階層提示部１１２は、一般化階層編集受付装置２００を用いる利用者による準識別子の指定を受け付けると、指定された準識別子の一般化階層データ１５３をパーソナル情報管理装置１５０から読み出して木構造に展開し、一時記憶部１２０に保存する。そして、一般化階層提示部１１２は、一時記憶部１２０に保存されている一般化階層等を一般化階層編集受付装置１７０の一般化階層編集画面上に表示させる。なお、「一般化階層等を一般化階層編集受付装置１７０の一般化階層編集画面上に表示させる」ことは、本発明における「指定された準識別子の一般化階層を利用者に提示する」ことの一例である。
一般化階層編集部１１３は、利用者による編集の指示に応じて、一般化階層提示部１１２によって提示された一般化階層を、木構造を保ちながら更新する。 As will be described later, the user interface unit 110 displays the generalized hierarchy editing screen 200 shown in FIG. When the generalized hierarchy editing accepting apparatus 170 transmits various information input on the generalized hierarchy editing screen 200 to the anonymization apparatus 101, the user interface unit 110 receives the information and passes it to other units.
Generalization hierarchy creation part 111 will acquire a data file, if information, such as a name of anonymization object data, a data file location where anonymization object data is stored, k anonymization threshold, and attribute information, is received. Then, the generalized hierarchy creating unit 111 creates anonymization target data, generalized hierarchy data, and quasi-identifier tuple frequency data from the data file, and anonymization target data 151, management data 152, and general data are stored in the personal information management device 150. Write hierarchical data 153 and semi-identifier tuple frequency data 154.
When the generalized hierarchy presentation unit 112 receives the designation of the semi-identifier by the user using the generalized hierarchy editing accepting apparatus 200, the generalized hierarchy presenting unit 112 reads the generalized hierarchy data 153 of the designated semi-identifier from the personal information management apparatus 150, and has a tree structure. And is stored in the temporary storage unit 120. Then, the generalized hierarchy presentation unit 112 displays the generalized hierarchy and the like stored in the temporary storage unit 120 on the generalized hierarchy editing screen of the generalized hierarchy editing accepting apparatus 170. “Displaying the generalized hierarchy etc. on the generalized hierarchy editing screen of the generalized hierarchy editing accepting apparatus 170” means “presenting the generalized hierarchy of the specified quasi-identifier to the user” in the present invention. It is an example.
The generalized hierarchy editing unit 113 updates the generalized hierarchy presented by the generalized hierarchy presenting unit 112 while maintaining the tree structure in response to an editing instruction from the user.

匿名化後頻度表算出部１１４は、匿名化対象データの準識別子の属性値を匿名化したときの準識別子の組み合わせの出現頻度を算出し、匿名化換後の準識別子の組み合わせとその出現頻度とを含む匿名化後タプル頻度表を出力する。
損失指標算出部１１５は、匿名化後頻度表算出部１１４により出力される匿名化後タプル頻度表を基に、匿名化前後の情報量を計算して匿名化前のデータからどれだけ情報の精度が落ちたかを表す情報損失指標である情報損失率を計算する。 The post-anonymization frequency table calculation unit 114 calculates the appearance frequency of the combination of quasi-identifiers when the attribute value of the quasi-identifier of the anonymization target data is anonymized, and the combination of quasi-identifiers after anonymization and the appearance frequency thereof An anonymized tuple frequency table including
The loss index calculation unit 115 calculates the amount of information before and after anonymization based on the post-anonymization tuple frequency table output by the post-anonymization frequency table calculation unit 114, and how much information accuracy is obtained from the data before anonymization. An information loss rate, which is an information loss index indicating whether or not an error has occurred, is calculated.

一般化階層保存部１１６は、利用者の保存操作に応じて一時記憶部１２０に保存されている一般化階層をパーソナル情報管理装置１５０の一般化階層データ１５３として書き込む。
匿名化処理部１１７は、利用者による匿名化要求に応答して匿名化対象データから匿名化データを求める。そして、匿名化処理部１１７は、利用者によって指定されたｋ匿名化閾値を満たすように匿名化データを修正してパーソナル情報管理装置１５０の匿名化データ１５５として書き込む。
なお、一般化階層作成部１１１は本発明の一般階層作成手段の一例であり、一般化階層提示部１１２は本発明の一般化階層提示手段の一例であり、一般化階層編集部１１３は本発明の一般化階層編集手段の一例であり、匿名化処理部１１７は本発明の匿名化手段の一例であり、匿名化後頻度表算出部１１４は本発明の匿名化後頻度表算出手段の一例である。 The generalized hierarchy storage unit 116 writes the generalized hierarchy stored in the temporary storage unit 120 as the generalized hierarchy data 153 of the personal information management device 150 according to the user's saving operation.
The anonymization processing unit 117 obtains anonymized data from the anonymization target data in response to the anonymization request by the user. Then, the anonymization processing unit 117 corrects the anonymization data so as to satisfy the k anonymization threshold specified by the user, and writes the data as anonymization data 155 of the personal information management device 150.
The generalized hierarchy creating unit 111 is an example of the general hierarchy creating unit of the present invention, the generalized hierarchy presenting unit 112 is an example of the generalized hierarchy presenting unit of the present invention, and the generalized hierarchy editing unit 113 is the present invention. The anonymization processing unit 117 is an example of the anonymization unit of the present invention, and the post-anonymization frequency table calculation unit 114 is an example of the post-anonymization frequency table calculation unit of the present invention. is there.

（一般化階層編集受付装置）
一般化階層編集受付装置１７０は、図２に示す一般化階層編集画面２００をディスプレイに表示する。
一般化階層編集画面２００は、メニューバー２１０と、匿名化設定表示エリア２５０と、損失指標評価エリア２６０と、一般化階層エディタ２７０とを含む。
メニューバー２１０には、項目として「データ」と「一般化階層」が表示される。メニューバー２１０をマウスで左クリックするなどの操作によって、「データ−取込」メニュー２２０を選択すると、図３に示す匿名化対象データ取込ダイアログ２２１がディスプレイ上に開く。後述するように、匿名化対象データ取込ダイアログ２２１上でデータ名とデータファイルのパスを指定し、ｋ匿名化閾値を設定し、匿名化対象データの属性と準識別子の情報を選択することができる。
また、メニューバー２１０をマウスで左クリックするなどの操作によって「データ−匿名化」メニュー２３０を選択すると、図４に示す匿名化実行ダイアログ２３１がディスプレイ上に開く。
また、メニューバー２１０をマウスで左クリックするなどの操作によって「一般階層化−保存」メニュー２４０を選択することにより、一般化階層を情報管理装置１５０の一般化階層データ１５３に保存することができる。 (Generalized hierarchy editing reception device)
The generalized hierarchy editing accepting apparatus 170 displays the generalized hierarchy editing screen 200 shown in FIG. 2 on the display.
Generalized hierarchy editing screen 200 includes menu bar 210, anonymization setting display area 250, loss index evaluation area 260, and generalized hierarchy editor 270.
In the menu bar 210, “data” and “generalized hierarchy” are displayed as items. When the “data-capture” menu 220 is selected by an operation such as left-clicking the menu bar 210 with a mouse, an anonymization target data capture dialog 221 shown in FIG. 3 is opened on the display. As will be described later, a data name and a data file path are designated on the anonymization target data import dialog 221, a k anonymization threshold is set, and anonymization target data attributes and quasi-identifier information are selected. it can.
When the “data-anonymization” menu 230 is selected by an operation such as left-clicking the menu bar 210 with the mouse, an anonymization execution dialog 231 shown in FIG. 4 is opened on the display.
Further, the generalized hierarchy can be saved in the generalized hierarchy data 153 of the information management apparatus 150 by selecting the “general hierarchy-save” menu 240 by an operation such as left-clicking the menu bar 210 with a mouse. .

匿名化設定表示エリア２５０は、匿名化対象データ取込ダイアログ２２１上で設定されたデータ名とｋ匿名化閾値と匿名化対象データの属性の一覧と匿名化対象データの準識別子の一覧とを表示する。図２では属性の一覧と準識別子の一覧とを別々の領域に表示しているが、同一の領域に表示して準識別子にあたる属性だけを太字で示すなどの方法で強調表示してもよい。また、属性の領域と準識別子の領域の間に、準識別子の追加や削除を実行するボタンを設けてもよい。 The anonymization setting display area 250 displays the data name set on the anonymization target data import dialog 221, the k anonymization threshold, the list of attributes of the anonymization target data, and the list of quasi-identifiers of the anonymization target data. To do. In FIG. 2, the list of attributes and the list of quasi-identifiers are displayed in separate areas, but they may be highlighted by a method such as displaying only the attributes corresponding to the quasi-identifiers in bold in the same area. Further, a button for adding or deleting a quasi-identifier may be provided between the attribute area and the quasi-identifier area.

損失指標評価エリア２６０は、匿名化実行ダイアログ２３１上で選択された一般化階層のレイヤをすべての準識別子について表示し、また、それらのレイヤを用いて匿名化を実行したときの情報損失率を表示する。
匿名化実行ダイアログ２３１上で準識別子の一般化階層のレイヤが選択されると、匿名化後頻度表算出部１１４は、選択されたレイヤに応じて匿名化後タプル頻度表を算出し、この匿名化後タプル頻度表を基に情報損失率を求める。情報損失率は、後述する式１から式９に定める方法によって算出される。 The loss index evaluation area 260 displays the layers of the generalized hierarchy selected on the anonymization execution dialog 231 for all quasi-identifiers, and indicates the information loss rate when anonymization is executed using those layers. indicate.
When the layer of the quasi-identifier generalized hierarchy is selected on the anonymization execution dialog 231, the post-anonymization frequency table calculation unit 114 calculates a post-anonymization tuple frequency table according to the selected layer, and this anonymous The information loss rate is obtained based on the tuple frequency table after conversion. The information loss rate is calculated by a method defined in Equations 1 to 9 described later.

一般化階層エディタ２７０は、準識別子ごとに一般化階層を表示し、準識別子の属性値または一般化（匿名化）された属性値とその出現頻度、および各レイヤでの情報損失率（ＩＬ：ＩｎｆｏｒｍａｔｉｏｎＬｏｓｓｒａｔｅ）を表示する。ここで、出現頻度とは、例えば、準識別子の属性値が匿名化対象データに現れる回数をいう。また、一般化階層エディタ２７０は、利用者によるノードの移動やノード名変更、レイヤの追加や削除などの編集操作に応じて一般化階層や情報損失率の表示を更新する。 The generalized hierarchy editor 270 displays the generalized hierarchy for each quasi-identifier, and the attribute value of the quasi-identifier or the generalized (anonymized) attribute value and its appearance frequency, and the information loss rate (IL: Information Loss rate) is displayed. Here, the appearance frequency means, for example, the number of times that the attribute value of the quasi-identifier appears in the anonymization target data. Also, the generalized hierarchy editor 270 updates the display of the generalized hierarchy and the information loss rate in accordance with editing operations such as node movement, node name change, layer addition and deletion by the user.

一般化階層は、匿名化対象データに含まれる準識別子の属性値およびその出現頻度を有する複数のノードを含む最下層のレイヤと、下層のレイヤに比べて匿名化の程度が同一またはより高い属性値およびその出現頻度を有する１つ以上のノードを含むレイヤとによって構成される。
例えば、図２中の一般化階層エディタ２７０には、あるパーソナル情報について「国籍」という準識別子の一般化階層が表示されている。各ボックスの上段は「国籍」という準識別子の属性値または一般化（匿名化）された属性値を表し、下段はその出現頻度を表す。
一般化階層は２個以上任意個のレイヤＬ_０，…，Ｌ_Ｊ−１から構成され、各々のレイヤは１個以上任意個のノードを持つ。さらに、隣接するレイヤ間には親子関係がある。つまり、ｊ＝０，…，Ｊ−２に対して、Ｌ_ｊはＬ_ｊ＋１の子レイヤ、Ｌ_ｊ＋１はＬ_ｊの親レイヤである。さらに、各々のレイヤに属するノードは、上位レイヤに高々１つの親ノードを、下位レイヤに０個以上任意個の子ノードを持つ。ノードが親ノードを持たないのは、該ノードが属するレイヤが最上層のレイヤＬ_Ｊ−１、つまりルートレイヤであるときである。ノードが子ノードを持たないのは、そのノードが属するレイヤが最下層のレイヤＬ_０、つまりリーフレイヤであるときである。また、ルートレイヤに含まれるノードをルートノード、リーフレイヤに含まれるノードをリーフノードという。 The generalization hierarchy is an attribute value of the quasi-identifier included in the data to be anonymized and an attribute having the same or higher degree of anonymization than the lowermost layer including a plurality of nodes having the appearance frequency and the lower layer A layer including one or more nodes having a value and its frequency of appearance.
For example, the generalized hierarchy editor 270 in FIG. 2 displays a generalized hierarchy of a quasi-identifier “nationality” for certain personal information. The upper row of each box represents the attribute value of the quasi-identifier “nationality” or the generalized (anonymized) attribute value, and the lower row represents the appearance frequency thereof.
The generalized hierarchy is composed of two or more arbitrary layers L ₀ ,..., L _J−1 , and each layer has one or more arbitrary nodes. Furthermore, there is a parent-child relationship between adjacent layers. That, j = 0, ..., with respect to J-2, _{L j} is _{L j + 1} of the child _{layers, L j + 1} is the parent layer of the _{L j.} Further, the nodes belonging to each layer have at most one parent node in the upper layer and 0 or more arbitrary child nodes in the lower layer. The node does not have a parent node when the layer to which the node belongs is the uppermost layer L _J−1 , that is, the root layer. The node has no child node when the layer to which the node belongs is the lowest layer L ₀ , that is, the leaf layer. A node included in the root layer is referred to as a root node, and a node included in the leaf layer is referred to as a leaf node.

図２中の一般化階層エディタ２７０に表示されている例では、第５レイヤがルートレイヤであり、ルートノードの属性値は「仏中英独米日」である。また、第０レイヤがリーフレイヤであり、リーフノードの属性値は「仏」、「中」、「英」、「独」、「米」、および「日」である。第０レイヤは、匿名化する前の属性値各々を有するノードを含む。なお、例えば、「日」は第０レイヤから第４レイヤまでに含まれるが、第０レイヤの「日」がリーフノードであり、第１レイヤから第４レイヤまでの「日」はリーフノードではない。
以下では、あるレイヤのノード（親ノード）につながる上層のレイヤのノードを祖先ノードという。第５レイヤの「仏中英独米日」、第４レイヤの「仏中英独米」、および第３レイヤの「仏中英独」は、第２レイヤのノード「仏中英」の祖先ノードである。また、あるレイヤのノード（親ノード）につながる下層のレイヤのノードを子孫ノードという。たとえば、第１レイヤの「仏中」と「英」および第０レイヤの「仏」と「中」と「英」は、第２レイヤの親ノード「仏中英」の子孫ノードである。また、第０〜第３レイヤのノード「日」は、第４レイヤの親ノード「日」の子孫ノードである。 In the example displayed in the generalized hierarchy editor 270 in FIG. 2, the fifth layer is the root layer, and the attribute value of the root node is “French, Chinese, English, German, US, and Japanese”. The 0th layer is a leaf layer, and the attribute values of leaf nodes are “French”, “Medium”, “English”, “Germany”, “US”, and “Japan”. The 0th layer includes a node having each attribute value before anonymization. For example, “day” is included from the 0th layer to the 4th layer, but “day” of the 0th layer is a leaf node, and “day” from the 1st layer to the 4th layer is a leaf node. Absent.
Hereinafter, an upper layer node connected to a node (parent node) of a certain layer is referred to as an ancestor node. The fifth layer, “French, Chinese, English, and US”, the fourth layer, “French, Chinese, English, and US”, and the third layer, “French, Chinese, English, and German,” It is a node. A node in a lower layer connected to a node (parent node) of a certain layer is called a descendant node. For example, “French Chinese” and “English” in the first layer and “French”, “Central”, and “English” in the 0th layer are descendant nodes of the parent node “French Chinese in English” in the second layer. Also, the node “day” of the 0th to third layers is a descendant node of the parent node “day” of the fourth layer.

（一般化階層エディタにおける編集）
一般化階層エディタ２７０上でマウスにより右クリックするなどの操作によって、コンテキストメニューを開くことができる。コンテキストメニューでは、「ノード名変更」、「ノード移動」、「レイヤ追加（上）」、「レイヤ追加（下）」および「レイヤ削除」等の各操作を行うことができる。
図５は、一般化階層エディタ２７０上に表示されるコンテキストメニューで選択可能な項目の一例を示す。選択可能な項目はレイヤに依存するが、加えてノード上で右クリックしたかそれ以外の空白エリア（以下、非ノードという。）で右クリックしたかにも依存する。
図５において、Ｎはルートレイヤのレイヤ番号であり、１以上の整数値を取る。一般化階層のレイヤの総数が２（Ｎ＝１）である場合は、第１レイヤはルートレイヤである。この場合、ルートノードに対する操作を行うときは、図５におけるレイヤ「１」の場合とレイヤ「Ｎ」の場合のＡＮＤを適用する。たとえば、第１階層でノードを選択したとき、レイヤ「１」の場合では「ノード移動」は○になっているが、レイヤ「Ｎ」の場合では「ノード移動」は×になっているので、「ノード移動」は選択できない。一般化階層のレイヤの総数が３（Ｎ＝２）である場合、第１レイヤはルートレイヤではない。この場合に第１レイヤに対する操作を行うときは、図５におけるレイヤ「１」の場合を適用する。一般化階層のレイヤの総数が３（Ｎ＝２）である場合は、第２レイヤがルートレイヤである。この場合、ルートノードに対する操作を行うときは、図５におけるレイヤ「Ｎ」の場合を適用する。
例えば、図５によれば、一般化階層のレイヤの総数が２以上のときに第１レイヤ上のノードで右クリックしたときは「ノード名変更」と「ノード移動」が選択可能である。一方、第１レイヤの非ノード（ノードがないところ、すなわち空白エリア）で右クリックしたときは「レイヤ追加（上）」「レイヤ追加（下）」「レイヤ削除」が選択可能である。 (Edit in generalized hierarchy editor)
The context menu can be opened by an operation such as right-clicking on the generalized hierarchy editor 270 with the mouse. In the context menu, operations such as “change node name”, “move node”, “add layer (upper)”, “add layer (lower)”, and “delete layer” can be performed.
FIG. 5 shows an example of items that can be selected from the context menu displayed on the generalized hierarchy editor 270. The items that can be selected depend on the layer, but also depend on whether the user right-clicked on a node or another right-click area (hereinafter referred to as a non-node).
In FIG. 5, N is the layer number of the root layer and takes an integer value of 1 or more. When the total number of layers in the generalized hierarchy is 2 (N = 1), the first layer is the root layer. In this case, when performing an operation on the root node, the AND in the case of the layer “1” and the layer “N” in FIG. 5 is applied. For example, when a node is selected in the first hierarchy, “node movement” is ○ in the case of layer “1”, but “node movement” is × in layer “N”. “Move node” cannot be selected. When the total number of layers in the generalized hierarchy is 3 (N = 2), the first layer is not a root layer. In this case, when an operation is performed on the first layer, the case of the layer “1” in FIG. 5 is applied. When the total number of layers in the generalized hierarchy is 3 (N = 2), the second layer is the root layer. In this case, when performing an operation on the root node, the case of the layer “N” in FIG. 5 is applied.
For example, according to FIG. 5, when the total number of layers in the generalized hierarchy is two or more, “node name change” and “node movement” can be selected when right-clicking on a node on the first layer. On the other hand, when right-clicking on a non-node of the first layer (where there is no node, that is, a blank area), “add layer (above)”, “add layer (below)”, and “delete layer” can be selected.

図６は、「ノード名変更」の一例を示す。図６の例では第２レイヤの米をアメリカに変更している。
「ノード名変更」操作を行う場合、まず一般化階層エディタ２７０に表示されているノードをマウスで右クリックするとコンテキストメニューが表示される。次に、そのコンテキストメニュー上に表示されている「ノード名変更」をマウスで左クリックして選択する。これにより、一般化階層エディタ２７０上でこの選択されたノードのノード名を直接入力できるようになるので、キーボード等を用いて変更後のノード名を入力する。なお、ノード名変更後、ノード名を変更されたノードの祖先ノードや子孫ノード（リーフノードは除く）など、改名を推奨するノードを強調表示することとしてもよい。強調表示の方法としては、例えばノード名を赤色で表示したり、太線で表示する等の方法が考えられる。強調表示の例として、図６に改名を推奨するノードのボックスの枠線を太線化する例を示す。
「ノード名変更」では編集前後でノード間の階層構造は変わらないため、情報損失率は変わらない。 FIG. 6 shows an example of “change node name”. In the example of FIG. 6, the second layer of rice is changed to the United States.
When performing a “change node name” operation, a context menu is displayed when a node displayed in the generalized hierarchy editor 270 is first right-clicked with the mouse. Next, “Change Node Name” displayed on the context menu is selected by left-clicking with the mouse. As a result, the node name of the selected node can be directly input on the generalized hierarchy editor 270, and the changed node name is input using a keyboard or the like. Note that after the node name is changed, a node that is recommended to be renamed, such as an ancestor node or a descendant node (excluding a leaf node) of the node whose node name has been changed, may be highlighted. As a highlighting method, for example, a method of displaying the node name in red or displaying it with a thick line is conceivable. As an example of emphasis display, FIG. 6 shows an example in which the border of the box of the node whose name is recommended is bolded.
In “change node name”, the hierarchical structure between nodes does not change before and after editing, so the information loss rate does not change.

「ノード移動」操作を行う場合、まず一般化階層エディタ２７０に表示されているノードをマウスで右クリックするとコンテキストメニューが表示される。次に、そのコンテキストメニュー上に表示されている「ノード移動」をマウスで左クリックして選択する。そして、移動先の親ノードをマウスで左クリックする。
ノード移動は、移動対象ノードの子孫ノードも含めて階層構造を保ちながら移動する。その際、一般化階層は元のデータの属性値を曖昧化する階層であるので、レイヤ０にすべての匿名化前の属性値を持つ必要がある。このため、「ノード移動」は、移動元と移動先の階層によって以下の３種類に分けられる。
（１）移動によって階層が上がる場合：移動するノードの子孫ノードも一緒に移動し、移動するノードの末端にあるノードを第０レイヤまで伸ばす。第１レイヤのノード「中」を第３レイヤの「日」の下のレイヤ（第２レイヤ）に移動する例を図７に示す。
（２）移動によって階層が変わらない場合：移動するノードを子孫ノードごと移動する。第１レイヤのノード「中」を第２レイヤの「日」の下のレイヤ（第１レイヤ）に移動する例を図８に示す。
（３）移動によって階層が下がる場合：移動するノードを子孫ノードごと移動する。ただし、このまま移動すると、移動するノードの末端にあるノードが第０レイヤよりも下に位置してしまうので、移動前の第０レイヤからはみ出す分だけ下に階層を増やし、その最下層を改めて第０レイヤとする。さらに、移動するノードの子孫ノードに含まれないリーフノードは新しい第０レイヤまで伸ばす。第１レイヤのノード「仏独」を第１レイヤの「英」の下のレイヤ（第０レイヤ）に移動する例を図９に示す。
なお、ノード移動後、改名を推奨するノードを強調表示することとしてもよい。強調表示の例として、図７〜図９に改名を推奨するノードのボックスの枠線を太線化する例を示す。 When performing the “move node” operation, a context menu is displayed when the node displayed in the generalized hierarchy editor 270 is first right-clicked with the mouse. Next, “move node” displayed on the context menu is selected by left-clicking with the mouse. Then, left-click the destination parent node with the mouse.
The node movement is performed while maintaining the hierarchical structure including the descendant nodes of the movement target node. At this time, since the generalized hierarchy is a hierarchy that obfuscates the attribute values of the original data, it is necessary to have all the attribute values before anonymization in layer 0. For this reason, “node movement” is divided into the following three types depending on the hierarchy of the movement source and the movement destination.
(1) When the hierarchy is increased by movement: The descendant nodes of the moving node are moved together, and the node at the end of the moving node is extended to the 0th layer. FIG. 7 shows an example in which the node “medium” in the first layer is moved to a layer (second layer) below “day” in the third layer.
(2) When the hierarchy does not change due to movement: The node to be moved is moved together with the descendant nodes. FIG. 8 shows an example in which the node “medium” of the first layer is moved to a layer (first layer) below “day” of the second layer.
(3) When the hierarchy is lowered by the movement: The moving node is moved together with the descendant nodes. However, if it moves as it is, the node at the end of the moving node will be located below the 0th layer, so the hierarchy will be increased by the amount that protrudes from the 0th layer before the movement, and the lowest layer will be changed again. 0 layers are assumed. Further, the leaf nodes not included in the descendant nodes of the moving node are extended to the new 0th layer. FIG. 9 shows an example in which the node “French and German” in the first layer is moved to a layer (0th layer) below “English” in the first layer.
Note that, after a node is moved, a node that is recommended to be renamed may be highlighted. As an example of highlighting, FIGS. 7 to 9 show examples in which the box line of a node whose name is recommended is thickened.

また、図５に示すように、第０レイヤのノード、つまりリーフノードの移動も可能である。例えば、図２の一般化階層エディタ２７０に例示されている一般化階層において、ノード「中」を第１レイヤ以上のノード「日」の下に移動し、移動先の親ノードを「アジア」に改名するというような操作が考えられる。この操作によるリーフノードの移動が行われると、移動されたリーフノードは第０レイヤまで子孫ノードが伸び、同じ属性値を持つリーフノードが作成される。
ただし、「ノード移動」では、リーフノードを親ノードに指定することはできない。リーフノードは匿名化される前の生の属性値を表しているので、リーフノードが他の生の属性値を子孫に持つことは考えられないからである。
また、あるノードをその子孫ノードの下に移動することはできない。このような移動を可能とすると、ノードの親子関係が循環してしまうからである。この点は、Ｗｉｎｄｏｗｓ（登録商標）のエクスプローラで、あるフォルダをそのサブフォルダの下に移動する操作はできないのと同様である。
「ノード移動」では、ノード移動完了後に情報損失率を再計算し、再計算された情報損失率を一般化階層エディタ２７０に表示する。 Further, as shown in FIG. 5, the node of the 0th layer, that is, the leaf node can also be moved. For example, in the generalized hierarchy illustrated in the generalized hierarchy editor 270 of FIG. 2, the node “middle” is moved under the node “day” of the first layer or higher, and the destination parent node is set to “Asia”. An operation such as renaming can be considered. When a leaf node is moved by this operation, descendant nodes extend to the 0th layer of the moved leaf node, and leaf nodes having the same attribute value are created.
However, in “node movement”, a leaf node cannot be designated as a parent node. This is because a leaf node represents a raw attribute value before being anonymized, and it is unlikely that a leaf node has another raw attribute value as a descendant.
A node cannot be moved under its descendant nodes. This is because when such movement is possible, the parent-child relationship of the nodes circulates. This is the same as the Windows (registered trademark) Explorer cannot be used to move a folder under the subfolder.
In the “node movement”, the information loss rate is recalculated after the node movement is completed, and the recalculated information loss rate is displayed on the generalized hierarchy editor 270.

「レイヤ追加（上）」、「レイヤ追加（下）」または「レイヤ削除」操作を行う場合、まず一般化階層エディタ２７０に表示されているレイヤの非ノード（ノードがないところ、すなわち空白エリア）をマウスで右クリックするとコンテキストメニューが表示される。次に、そのコンテキストメニュー上に表示されている「レイヤ追加（上）」、「レイヤ追加（下）」および「レイヤ削除」のいずれかをマウスで左クリックして選択する。
「レイヤ追加（上）」では、選択したレイヤの上に新しくレイヤを作成する。新しいレイヤのノードは、選択したレイヤのノードがそのまま現れる。選択したレイヤと新しいレイヤでのノード間の親子関係は、同じ名前のノード同士が親子になる。新しいレイヤとその親レイヤのノード間の親子関係は、レイヤ追加前の選択したレイヤとその親レイヤの親子関係を引き継ぐ。
ただし、ルートレイヤの上には新しいレイヤを作成できない。
「レイヤ追加（上）」では追加したレイヤでの情報損失率は、選択したレイヤのそれと同じである。 When performing “add layer (above)”, “add layer (below)”, or “delete layer” operation, first the non-node of the layer displayed in the generalized hierarchy editor 270 (where there is no node, ie, a blank area) Right-click with the mouse to display the context menu. Next, one of “add layer (top)”, “add layer (bottom)” and “delete layer” displayed on the context menu is selected by left-clicking with the mouse.
In “Add layer (upper)”, a new layer is created on the selected layer. As the new layer node, the node of the selected layer appears as it is. In the parent-child relationship between the nodes in the selected layer and the new layer, the nodes having the same name become the parent-child. The parent-child relationship between the node of the new layer and its parent layer inherits the parent-child relationship of the selected layer and its parent layer before adding the layer.
However, a new layer cannot be created above the root layer.
In “add layer (upper)”, the information loss rate in the added layer is the same as that of the selected layer.

「レイヤ追加（下）」では、選択したレイヤの下に新しくレイヤを作成する。新しいレイヤのノードは、選択したレイヤの子レイヤにあったノードがそのまま現れる。新しいレイヤの子レイヤと新しいレイヤでのノード間の親子関係は、同じ名前のノード同士が親子になる。新しいレイヤとその親レイヤのノード間の親子関係は、レイヤ追加前の選択したレイヤとその子レイヤの親子関係を引き継ぐ。
ただし、リーフレイヤの下には新しいレイヤを作成できない。
「レイヤ追加（下）」では追加したレイヤでの情報損失率は、選択したレイヤのレイヤ追加前の子レイヤのそれと同じである。 In “Add layer (lower)”, a new layer is created under the selected layer. As the node of the new layer, the node that was in the child layer of the selected layer appears as it is. In the parent-child relationship between the child layer of the new layer and the node in the new layer, the nodes having the same name become the parent-child. The parent-child relationship between the node of the new layer and its parent layer inherits the parent-child relationship of the selected layer and its child layer before adding the layer.
However, a new layer cannot be created under the leaf layer.
In “add layer (lower)”, the information loss rate in the added layer is the same as that of the child layer of the selected layer before adding the layer.

「レイヤ削除」では、選択したレイヤを削除する。削除するレイヤの一つ下のレイヤと一つ上のレイヤとの子孫関係が、レイヤ削除後の一般化階層での親子関係になる。
ただし、ルートレイヤとリーフレイヤの削除はできない。
「レイヤ削除」では編集前後でリーフレイヤと他レイヤとの子孫関係は変わらないため、情報損失率は変わらない。 In “delete layer”, the selected layer is deleted. The descendant relationship between the layer immediately below the layer to be deleted and the layer above is the parent-child relationship in the generalized hierarchy after the layer is deleted.
However, the root layer and leaf layer cannot be deleted.
In the “delete layer”, since the descendant relationship between the leaf layer and other layers does not change before and after editing, the information loss rate does not change.

（匿名化対象データ）
匿名化対象データ１５１は、図１０に示すような表形式をとる。匿名化対象データ１５１のレコードは、レコードの識別子であるキーと、レコードの属性であるいくつかの列を有する。ここでは、ＫＥＹという列が識別子であり、他に年齢、国籍、専攻及び通勤時間といった列を持つ。本実施形態ではこれらの列のうち、年齢、国籍及び専攻を準識別子とする。 (Data to be anonymized)
The anonymization target data 151 takes a tabular form as shown in FIG. The record of the anonymization target data 151 includes a key that is an identifier of the record and several columns that are attributes of the record. Here, the column “KEY” is an identifier, and there are columns such as age, nationality, major, and commuting time. In this embodiment, among these columns, age, nationality, and major are used as quasi-identifiers.

（管理データ）
管理データ１５２は、図１１に示すように、データ名と、ｋ匿名化閾値と、列情報と、一般化階層情報とを有する。
データ名は、匿名化対象のデータの名前である。
ｋ匿名化閾値は、データ名が表す匿名化対象データ１５１をｋ匿名化するときの閾値である。
列情報は、匿名化対象データ１５１が持つ属性情報である。列情報は、属性の名前と型を持ち、準識別子かどうかを表すフラグ（準識別子である場合Ｔｒｕｅ、準識別子でない場合Ｆａｌｓｅ）を持つ属性情報のリストである。
一般化階層情報は、準識別子の名前を示す準識別子名と、準識別子に対する一般化階層名とを含む情報のリストである。ここで、一般化階層名は、パーソナル情報管理装置１５０の一般化階層データ１５３でのテーブル名に対応している。一般化階層のデータは、一般化階層データ１５３の中にテーブルとして書き込まれる。一般化階層名は、そのテーブルの名前であり、例えば、データ名と準識別子名を一定の区切り文字で結合したものである。また、別々の準識別子の一般化階層のデータを同じテーブルに格納することもできる。この場合も一般化階層名はテーブル名に対応しており、同じテーブルに格納された一般化階層のデータに共通の一般化階層名が付与される。この場合の一般化階層名は、例えば、データ名の後に一定のサフィックスを結合したものとすることができる。
図２の一般化階層エディタ２７０上で、管理データ１５２のデータ名に紐づく一般化階層を編集することができる。 (Management data)
As shown in FIG. 11, the management data 152 includes a data name, a k anonymization threshold value, column information, and generalized hierarchy information.
The data name is the name of the data to be anonymized.
The k anonymization threshold is a threshold for anonymizing the anonymization target data 151 represented by the data name.
The column information is attribute information that the anonymization target data 151 has. The column information is a list of attribute information having a name and a type of the attribute and having a flag indicating whether it is a quasi-identifier (True if it is a quasi-identifier, False if not a quasi-identifier).
The generalized hierarchy information is a list of information including a quasi-identifier name indicating the name of the quasi-identifier and a generalized hierarchy name for the quasi-identifier. Here, the generalized hierarchy name corresponds to the table name in the generalized hierarchy data 153 of the personal information management apparatus 150. The data of the generalized hierarchy is written as a table in the generalized hierarchy data 153. The generalized hierarchy name is the name of the table, and is, for example, a combination of a data name and a quasi-identifier name with a certain delimiter. Also, data of generalized hierarchies with different quasi-identifiers can be stored in the same table. Also in this case, the generalized hierarchy name corresponds to the table name, and a common generalized hierarchy name is assigned to the data of the generalized hierarchy stored in the same table. The generalized hierarchy name in this case can be, for example, a combination of a fixed suffix after the data name.
The generalized hierarchy associated with the data name of the management data 152 can be edited on the generalized hierarchy editor 270 of FIG.

（一般化階層データ）
一般化階層データ１５３は、図１２に示すように、準識別子名と、オブジェクトＩＤと、値と、頻度とを有する。一般化階層データ１５３は、準識別子名とオブジェクトＩＤをキーとする表である。
オブジェクトＩＤは、一般化階層の属性値または親子関係の識別子であり、属性値の識別子はＮＸＸＸの形式で、親子関係の識別子はＰＮＹＹＹの形式である。ここで、属性値の識別子「ＮＸＸＸ」に対する値は、その識別子「ＮＸＸＸ」で特定される属性値である。また、オブジェクトＩＤ「ＰＮＹＹＹ」に対する値は、ノード「ＮＹＹＹ」の親ノードのオブジェクトＩＤである。
頻度は、属性値の出現頻度を示す。出現頻度は、オブジェクトＩＤ「ＮＸＸＸ」に対してのみ存在し、オブジェクトＩＤ「ＰＮＹＹＹ」に対する出現頻度は空欄である。
例えば、図１２の１行目は、図２の一般化階層エディタ２７０中に示される一般化階層例におけるリーフレイヤの最も左のノードに対応しており、準識別子「国籍」のノード「Ｎ００１」は属性値「仏」を持ち、その出現頻度は「５」であることを表す。同様に図１２の２行目は図２の一般化階層エディタ２７０中に示される一般化階層例における第１レイヤの最も左のノードに対応しており、準識別子「国籍」のノード「Ｎ００２」は属性値「仏中」を持ち、その出現頻度は「１０」であることを表す。図１２の３行目はオブジェクトＩＤ「Ｎ００１」のノード「仏」の親がオブジェクトＩＤ「Ｎ００２」のノード「仏中」であることを表す。 (Generalized hierarchy data)
As shown in FIG. 12, the generalized hierarchical data 153 includes a quasi-identifier name, an object ID, a value, and a frequency. The generalized hierarchy data 153 is a table using a semi-identifier name and an object ID as keys.
The object ID is an attribute value of the generalized hierarchy or an identifier of the parent-child relationship, the identifier of the attribute value is in the format of NXXX, and the identifier of the parent-child relationship is in the format of PNYYY. Here, the value of the attribute value identifier “NXXX” is the attribute value specified by the identifier “NXXX”. The value for the object ID “PNYYY” is the object ID of the parent node of the node “NYYY”.
The frequency indicates the appearance frequency of the attribute value. The appearance frequency exists only for the object ID “NXXX”, and the appearance frequency for the object ID “PNYYY” is blank.
For example, the first line of FIG. 12 corresponds to the leftmost node of the leaf layer in the generalized hierarchy example shown in the generalized hierarchy editor 270 of FIG. 2, and the node “N001” of the quasi-identifier “nationality”. Indicates that it has the attribute value “French” and the appearance frequency is “5”. Similarly, the second line of FIG. 12 corresponds to the leftmost node of the first layer in the generalized hierarchy example shown in the generalized hierarchy editor 270 of FIG. 2, and the node “N002” of the quasi-identifier “nationality”. Represents an attribute value “French in China” and its appearance frequency is “10”. The third line in FIG. 12 indicates that the parent of the node “French” of the object ID “N001” is the node “French center” of the object ID “N002”.

（準識別子タプル頻度データ）
準識別子タプル頻度データ１５４は、図１３に示すように、匿名化対象データの準識別子の値の組み合わせと、頻度とを有する。頻度は、準識別子の値の組み合わせがその匿名化対象データに現れる回数を示す。準識別子タプル頻度データ１５４は、匿名化対象データ１５１を基に予め算出されている。
準識別子タプル頻度データ１５４のデータ形式は、一般化階層を用いて匿名化対象データをｋ匿名化して得られる匿名化データの準識別子の値の組み合わせに対する頻度を表現する匿名化後タプル頻度表にも使用される。 (Quasi-identifier tuple frequency data)
As shown in FIG. 13, the quasi-identifier tuple frequency data 154 includes a combination of quasi-identifier values of anonymization target data and a frequency. The frequency indicates the number of times that the combination of the quasi-identifier values appears in the anonymization target data. The quasi-identifier tuple frequency data 154 is calculated in advance based on the anonymization target data 151.
The data format of the quasi-identifier tuple frequency data 154 is a post-anonymization tuple frequency table that expresses the frequency for the combination of quasi-identifier values of anonymized data obtained by anonymizing the anonymization target data using a generalized hierarchy. Also used.

次に、匿名化後頻度表算出部１１４と損失指標算出部１１５について説明するが、これらの両部の処理は、図２０の一般化階層作成処理の後で実行される。
（匿名化後頻度表算出処理）
図１４は、匿名化後頻度表算出部１１４における匿名化後頻度表算出処理の流れの一例を示す。
匿名化後頻度表算出部１１４は、図１２の一般化階層データ１５３と、図１３の準識別子タプル頻度データ１５４とをパーソナル情報管理装置１５０から主メモリに読み出す（Ｓ１０１）。そして、匿名化後頻度表算出部１１４は、読み出した準識別子タプル頻度データ１５４に含まれる全件のデータについて１件ずつ以下の処理を繰り返し、図１６に示す匿名化後タプル頻度表３００を算出する（Ｓ１０２）。
具体的には、匿名化後頻度表算出部１１４は、準識別子タプル頻度データ１５４から、現在のレコードのキーを取得する（Ｓ１０３）。このキーは、読み出した一般化階層データ１５３により構成される一般化階層におけるリーフノードの属性値を組み合わせたものである。例えば、図１３の（２７，日，地球物理学）というようなものになる。
次に、匿名化後頻度表算出部１１４は、このキーに含まれる属性値を、指定された各準識別子のレイヤの属性値に置き換えて新しいキーを作成する（Ｓ１０４）。なお、各準識別子のレイヤの指定は図４の匿名化実行ダイアログ２３１上で行うことができる。また、後述する一般化階層提示処理と一般化階層編集処理の実行中には、一般化階層エディタ２７０上に一般化階層が表示される準識別子についてはレイヤの指定がリーフレイヤからルートレイヤまで自動で更新される。例えば、前述のキー（２７，日，地球物理学）を、図１５に示す一般化階層の例を用いて、年齢・国籍および専攻の各属性値を第１レイヤの各属性値で置換したとき、置換されたキーは（２０代，日，理系）となる。
次に、匿名化後頻度表算出部１１４は、置換されたキーに対して匿名化後タプル頻度表３００に該当するキーのエントリがあるか確認し（Ｓ１０５）、エントリがなければ（Ｓ１０６：Ｙｅｓ）、頻度の値０でエントリを作成する（Ｓ１０７）。続いて、匿名化後頻度表算出部１１４は、置換されたキーに対する匿名化後タプル頻度表３００のエントリを取得し、その頻度の値に置換されたキーに対応する現在のレコードのキーに対する準識別子タプル頻度データ１５４の頻度の値を加算して、匿名化後タプル頻度表３００のエントリを更新する（Ｓ１０８）。これを準識別子タプル頻度データ１５４に含まれる全件のデータに対して繰り返す（Ｓ１０２）。
匿名化後頻度表算出部１１４は、例えば、図１３の準識別子タプル頻度データ１５４と図１５の一般化階層を用い、年齢・国籍及び専攻の一般化階層のレイヤをすべて第１レイヤで指定したとき、図１６に示す匿名化後タプル頻度表３００を出力する。 Next, the post-anonymization frequency table calculation unit 114 and the loss index calculation unit 115 will be described. The processing of both these units is executed after the generalized hierarchy creation processing of FIG.
(Anonymized frequency table calculation process)
FIG. 14 shows an exemplary flow of the post-anonymization frequency table calculation process in the post-anonymization frequency table calculation unit 114.
The post-anonymization frequency table calculation unit 114 reads the generalized hierarchy data 153 in FIG. 12 and the quasi-identifier tuple frequency data 154 in FIG. 13 from the personal information management device 150 to the main memory (S101). Then, the post-anonymization frequency table calculation unit 114 repeats the following processing one by one for all data included in the read quasi-identifier tuple frequency data 154, and calculates the post-anonymization tuple frequency table 300 shown in FIG. (S102).
Specifically, the post-anonymization frequency table calculation unit 114 acquires the key of the current record from the quasi-identifier tuple frequency data 154 (S103). This key is a combination of attribute values of leaf nodes in the generalized hierarchy constituted by the read generalized hierarchy data 153. For example, (27, day, geophysics) in FIG.
Next, the post-anonymization frequency table calculation unit 114 creates a new key by replacing the attribute value included in this key with the attribute value of the specified quasi-identifier layer (S104). The designation of each quasi-identifier layer can be performed on the anonymization execution dialog 231 in FIG. In addition, during the execution of the generalized hierarchy presentation process and the generalized hierarchy edit process, which will be described later, for the quasi-identifier whose generalized hierarchy is displayed on the generalized hierarchy editor 270, the layer designation is automatically performed from the leaf layer to the root layer. It is updated with. For example, when the above-mentioned key (27, day, geophysics) is replaced with each attribute value of the first layer by using the example of the generalized hierarchy shown in FIG. The replaced key is (20s, date, science).
Next, the post-anonymization frequency table calculation unit 114 confirms whether there is an entry of a key corresponding to the post-anonymization tuple frequency table 300 for the replaced key (S105), and if there is no entry (S106: Yes) ), An entry is created with a frequency value of 0 (S107). Subsequently, the post-anonymization frequency table calculation unit 114 obtains an entry in the post-anonymization tuple frequency table 300 for the replaced key, and prepares a quasi for the key of the current record corresponding to the key replaced with the frequency value. The frequency value of the identifier tuple frequency data 154 is added to update the entry in the anonymized tuple frequency table 300 (S108). This is repeated for all data included in the semi-identifier tuple frequency data 154 (S102).
The anonymized frequency table calculation unit 114 uses, for example, the quasi-identifier tuple frequency data 154 in FIG. 13 and the generalized hierarchy in FIG. 15 to specify all layers of the generalized hierarchy of age / nationality and major in the first layer. At this time, the anonymized tuple frequency table 300 shown in FIG. 16 is output.

（損失指標算出部）
図１７は、損失指標算出部１１５における損失指標算出処理の流れの一例を示す。
損失指標算出部１１５の入力は、準識別子タプル頻度データ１５４、および匿名化後頻度表算出部１１４で求められた匿名化後タプル頻度表３００である。
損失指標算出部１１５は、まず、準識別子タプル頻度データ１５４を準識別子ごとに集計して全ての準識別子の属性値の出現頻度を求めることにより、準識別子ごとの頻度表Ｆ１，…，ＦＮを作成する（Ｓ２０１）。ここで、準識別子ごとの頻度表は、匿名化対象データ１５１と単一の準識別子を決めたときの、その準識別子の属性値と出現頻度との対応関係を示す。例えば、図１３の準識別子タプル頻度データ１５４からは図１８のような準識別子ごとの頻度表が得られる。
次に、損失指標算出部１１５は、匿名化後タプル頻度表３００から、各準識別子について指定されたレイヤを対象として準識別子ごとの匿名化後頻度表Ｇ１，…，ＧＮを求める（Ｓ２０２）。なお、各準識別子のレイヤの指定は図４の匿名化実行ダイアログ２３１上で行うことができる。また、後述する一般化階層提示処理と一般化階層編集処理の実行中には、一般化階層エディタ２７０上に一般化階層が表示される準識別子についてはレイヤの指定がリーフレイヤからルートレイヤまで自動で更新される。ここで、準識別子ごとの匿名化後頻度表は、匿名化データと単一の準識別子を決めたときの、該準識別子の匿名化属性値と出現頻度との対応関係を示す。この出現頻度は、匿名化後タプル頻度表３００から特定の準識別子について頻度を集計することによって求められる。例えば、図１３の準識別子タプル頻度データ１５４と図１５の一般化階層を用い、すべての一般化階層で第１レイヤを指定したとき、図１９のような準識別子ごとの匿名化後頻度表が得られる。
次に、損失指標算出部１１５は、準識別子ごとの頻度表Ｆ１，…，ＦＮと匿名化対象データの全レコード数Ｒから匿名化前情報量を求める。ここで、匿名化前情報量は、匿名化実行ダイアログ２３１等で全ての準識別子についてリーフレイヤが指定されたときの情報量に相当する。匿名化前情報量の算出式は以下の数１から数４のとおりである。なお、これ以降に現れる式において、対数関数ｌｏｇの底は２であるとするが、システム内で統一されていれば、１０やネイピア数などの１を超える任意の正の数でよい。 (Loss index calculation part)
FIG. 17 shows an example of the flow of loss index calculation processing in the loss index calculation unit 115.
The inputs of the loss index calculation unit 115 are the quasi-identifier tuple frequency data 154 and the post-anonymization tuple frequency table 300 obtained by the post-anonymization frequency table calculation unit 114.
The loss index calculation unit 115 first calculates the frequency tables F1,..., FN for each quasi-identifier by counting the quasi-identifier tuple frequency data 154 for each quasi-identifier and obtaining the appearance frequency of the attribute values of all quasi-identifiers. Create (S201). Here, the frequency table for each quasi-identifier indicates the correspondence between the attribute value of the quasi-identifier and the appearance frequency when the anonymization target data 151 and a single quasi-identifier are determined. For example, a frequency table for each quasi-identifier as shown in FIG. 18 is obtained from the quasi-identifier tuple frequency data 154 in FIG.
Next, the loss index calculation unit 115 obtains post-anonymization frequency tables G1,..., GN for each semi-identifier from the post-anonymization tuple frequency table 300 for the layer specified for each semi-identifier (S202). The designation of each quasi-identifier layer can be performed on the anonymization execution dialog 231 in FIG. In addition, during the execution of the generalized hierarchy presentation process and the generalized hierarchy edit process, which will be described later, for the quasi-identifier whose generalized hierarchy is displayed on the generalized hierarchy editor 270, the layer designation is automatically performed from the leaf layer to the root layer. It is updated with. Here, the post-anonymization frequency table for each quasi-identifier indicates a correspondence relationship between the anonymization attribute value of the quasi-identifier and the appearance frequency when anonymization data and a single quasi-identifier are determined. This appearance frequency is obtained by counting the frequency for a specific quasi-identifier from the anonymized tuple frequency table 300. For example, when the quasi-identifier tuple frequency data 154 in FIG. 13 and the generalized hierarchy in FIG. 15 are used and the first layer is designated in all generalized hierarchies, the frequency table after anonymization for each quasi-identifier as in FIG. can get.
Next, the loss index calculation unit 115 obtains the amount of information before anonymization from the frequency tables F1,... FN for each quasi-identifier and the total number of records R of the anonymization target data. Here, the amount of information before anonymization corresponds to the amount of information when leaf layers are designated for all quasi-identifiers in the anonymization execution dialog 231 and the like. Formulas for calculating the amount of information before anonymization are as shown in the following equations 1 to 4. In the following formulas, the base of the logarithmic function log is assumed to be 2. However, if it is unified in the system, it may be any positive number exceeding 1 such as 10 or the number of Napier.

次に、損失指標算出部１１５は、準識別子ごとの匿名化後頻度表Ｇ１，…，ＧＮと匿名化対象データの全レコード数Ｒから匿名化後情報量を求める（Ｓ２０４）。ここで、匿名化後情報量は、各準識別子について指定されたレイヤで匿名化された場合の情報量である。各準識別子のレイヤの指定は匿名化実行ダイアログ２３１等で行うことができる。また、後述する一般化階層提示処理と一般化階層編集処理の実行中には、一般化階層エディタ２７０上に一般化階層が表示される準識別子についてはレイヤの指定がリーフレイヤからルートレイヤまで自動で更新される。匿名化後情報量の算出式は以下の数５から数８のとおりである。 Next, the loss index calculation unit 115 obtains an anonymized information amount from the anonymized frequency table G1,..., GN for each quasi-identifier and the total number of records R of the anonymization target data (S204). Here, the amount of information after anonymization is the amount of information when anonymized in the layer designated for each quasi-identifier. The designation of each quasi-identifier layer can be performed in the anonymization execution dialog 231 or the like. In addition, during the execution of the generalized hierarchy presentation process and the generalized hierarchy edit process, which will be described later, for the quasi-identifier whose generalized hierarchy is displayed on the generalized hierarchy editor 270, the layer designation is automatically performed from the leaf layer to the root layer. It is updated with. The formula for calculating the amount of information after anonymization is as shown in the following equations 5 to 8.

最後に、損失指標算出部１１５は、ここまでに求めた匿名化前情報量と匿名化後情報量から、数９に従って情報損失率を求める（Ｓ２０５）。なお、情報損失率は本発明の情報損失指標の一例である。

例えば、図２の一般化階層編集画面２００の損失指標評価エリア２６０に表示されている情報損失率（１２．９１％）は、準識別子「年齢」、「国籍」、および「専攻」について全て第１レイヤが指定されて匿名化された場合の値である。 Finally, the loss index calculation unit 115 obtains an information loss rate according to Equation 9 from the information amount before anonymization and the information amount after anonymization obtained so far (S205). The information loss rate is an example of the information loss index of the present invention.

For example, the information loss rate (12.91%) displayed in the loss index evaluation area 260 of the generalized hierarchy editing screen 200 of FIG. 2 is the same for all of the semi-identifiers “age”, “nationality”, and “major”. This is a value when one layer is designated and anonymized.

（一般化階層作成処理）
図２０は、一般化階層作成処理の流れの一例を示す。
利用者が一般化階層編集受付装置１７０に表示されている一般化階層編集画面２００上で、「データ−取込」メニュー２２０を選択する（Ｓ３０１）と、図３の匿名化対象データ取込ダイアログ２２１が開く（Ｓ３０２）。利用者は匿名化対象データ取込ダイアログ２２１上で、データ名、データファイルのパス、ｋ匿名化閾値、および列情報を入力する。列情報としては、取り込もうとしているデータファイルを基に作成される匿名化対象データを構成する各列の名前、型を指定し、さらに各列の中から準識別子として使用する列を指定する。利用者がデータ名、データファイルのパス、ｋ匿名化閾値、および列情報を入力して取込ボタンを押すと、一般化階層編集受付装置１７０は、データファイルを指定されたパスから読み出して、データ名、データファイル、ｋ匿名化閾値、および列情報を含む匿名化対象データ取込要求を匿名化装置１０１に送信する（Ｓ３０３）。
匿名化装置１０１がデータ名、データファイル、ｋ匿名化閾値、および列情報を含む匿名化対象データ取込要求を受信する（Ｓ３０４）と、一般化階層作成部１１１は、受信したデータファイルから匿名化される対象である複数の準識別子の属性値の組み合わせを含む匿名化対象データを作成し、作成された匿名化対象データをパーソナル情報管理装置１５０の匿名化対象データ１５１として書き込む（Ｓ３０５）。
次に、一般化階層作成部１１１は、その匿名化対象データ１５１と列情報により指定された準識別子を基に、準識別子毎に、匿名化対象データ１５１に含まれる属性値およびその出現頻度を有する複数のノードを含む最下層のレイヤと、下層のレイヤに比べて匿名化の程度が同一またはより高い属性値およびその出現頻度を有する１つ以上のノードを含むレイヤとによって構成される木構造の一般化階層を作成し、パーソナル情報管理装置１５０の一般化階層データ１５３として書き込む（Ｓ３０６）。ここでの一般化階層の自動生成は、特許文献１に示すような技術を用いる。 (Generalized hierarchy creation process)
FIG. 20 shows an example of the flow of the generalized hierarchy creation process.
When the user selects the “data-capture” menu 220 on the generalized hierarchy editing screen 200 displayed on the generalized hierarchy editing accepting apparatus 170 (S301), the anonymization target data import dialog shown in FIG. 221 opens (S302). The user inputs a data name, a data file path, a k anonymization threshold value, and column information on the anonymization target data capture dialog 221. As the column information, the name and type of each column constituting the anonymization target data created based on the data file to be imported are specified, and the column to be used as a quasi-identifier is specified from each column. When the user inputs a data name, a data file path, k anonymization threshold value, and column information and presses the import button, the generalized hierarchy editing accepting device 170 reads the data file from the specified path, An anonymization target data capture request including the data name, data file, k anonymization threshold, and column information is transmitted to the anonymization device 101 (S303).
When the anonymization apparatus 101 receives an anonymization target data import request including a data name, a data file, a k anonymization threshold, and column information (S304), the generalized hierarchy creating unit 111 is anonymous from the received data file. Anonymization target data including a combination of attribute values of a plurality of quasi-identifiers to be converted is created, and the created anonymization target data is written as anonymization target data 151 of personal information management device 150 (S305).
Next, based on the anonymization target data 151 and the quasi-identifier specified by the column information, the generalized hierarchy creating unit 111 determines the attribute value included in the anonymization target data 151 and its appearance frequency for each quasi-identifier. A tree structure including a lowermost layer including a plurality of nodes, and a layer including one or more nodes having an attribute value having the same or higher degree of anonymization and its appearance frequency as compared to the lower layer Is created and written as generalized hierarchy data 153 of the personal information management device 150 (S306). The automatic generation of the generalized hierarchy here uses a technique as shown in Patent Document 1.

そして、一般化階層作成部１１１は、匿名化対象データ１５１を基に準識別子タプル頻度データを求め、パーソナル情報管理装置１５０の準識別子タプル頻度データ１５４に書き込む（Ｓ３０７）。さらに、一般化階層作成部１１１は、受信したデータ名、ｋ匿名化閾値、および列情報を基に、パーソナル情報管理装置１５０の管理データ１５２を書き込む（Ｓ３０８）。ここで、一般化階層名には例えば、パーソナル情報管理装置１５０の一般化階層データ１５３を書き出したときの書き出し先のテーブル名を指定するが、これに限るものではない。
一般化階層作成部１１１は、これらの一連の処理が完了すると一般化階層作成処理の完了を一般化階層編集受付装置１７０に通知する（Ｓ３０９）。一般化階層編集受付装置１７０は通知を受けて、匿名化対象データ取込ダイアログ２２１を閉じる（Ｓ３１０）。 Then, the generalized hierarchy creating unit 111 obtains quasi-identifier tuple frequency data based on the anonymization target data 151 and writes the quasi-identifier tuple frequency data 154 of the personal information management device 150 (S307). Further, the generalized hierarchy creating unit 111 writes the management data 152 of the personal information management device 150 based on the received data name, k anonymization threshold, and column information (S308). Here, for example, the table name of the writing destination when the generalized hierarchy data 153 of the personal information management apparatus 150 is written is specified as the generalized hierarchy name, but it is not limited to this.
When these series of processes are completed, the generalized hierarchy creating unit 111 notifies the generalized hierarchy editing accepting apparatus 170 of the completion of the generalized hierarchy creating process (S309). Upon receiving the notification, the generalized hierarchy editing accepting apparatus 170 closes the anonymization target data import dialog 221 (S310).

（一般化階層提示処理）
図２１は、一般化階層提示処理の流れの一例を示す。
図２の一般化階層編集画面２００の匿名化設定表示エリア２５０には、匿名化対象データ取込ダイアログ２２１で設定したデータ名とｋ匿名化閾値と匿名化対象データの属性の一覧と匿名化対象データの準識別子の一覧とが表示される。利用者が匿名化設定表示エリア２５０で準識別子名を選択する（Ｓ４０１）と、一般化階層編集受付装置１７０は、準識別子名を含む一般化階層読出要求を匿名化装置１０１に送信する（Ｓ４０２）。
匿名化装置１０１が一般化階層読出要求を受信する（Ｓ４０３）と、一般化階層提示部１１２は、パーソナル情報管理装置１５０の一般化階層データ１５３から、一般化階層を読み込み、木構造のデータ構造に変換して一時記憶部１２０に書き込む（Ｓ４０４）。
次に、一般化階層提示部１１２は、準識別子タプル頻度データ１５４を準識別子ごとに集計して、すべての準識別子について属性値の出現頻度を求める（Ｓ４０５）。なお、ステップＳ４０５はステップＳ２０１と同一の処理であり、各準識別子の属性値の出現頻度は、一般化階層におけるリーフノードの出現頻度に相当する。
そして、一般化階層提示部１１２は、一般化階層読出要求に含まれる選択された準識別子の一般化階層（一般化階層エディタ２７０に表示されている準識別子の一般化階層）について、ステップＳ４０５で求めたリーフノードの出現頻度を用いてリーフノードの上層のノードの属性値の出現頻度を求める（Ｓ４０６）。例えば、図２の一般化階層編集画面２００の一般化階層エディタ２７０に表示されている一般化階層については、第１レイヤのノード「仏中」の頻度１０は、第０レイヤのノード「仏」の頻度５と「中」の頻度５を加算することによって求められ、また、第２レイヤのノード「仏中英」の頻度２５は、第１レイヤのノード「仏中」の頻度１０と「英」の頻度１５を加算することによって求められる。 (Generalized hierarchy presentation process)
FIG. 21 shows an example of the flow of the generalized hierarchy presentation process.
In the anonymization setting display area 250 of the generalized hierarchy editing screen 200 in FIG. 2, a list of data names, anonymization threshold values, anonymization target data attributes, and anonymization targets set in the anonymization target data capture dialog 221. A list of data quasi-identifiers is displayed. When the user selects a quasi-identifier name in the anonymization setting display area 250 (S401), the generalized hierarchy editing accepting apparatus 170 transmits a generalized hierarchy read request including the quasi-identifier name to the anonymization apparatus 101 (S402). ).
When the anonymization apparatus 101 receives the generalized hierarchy read request (S403), the generalized hierarchy presentation unit 112 reads the generalized hierarchy from the generalized hierarchy data 153 of the personal information management apparatus 150, and the tree-structured data structure And is written in the temporary storage unit 120 (S404).
Next, the generalized hierarchy presentation unit 112 aggregates the quasi-identifier tuple frequency data 154 for each quasi-identifier, and obtains the appearance frequency of attribute values for all the quasi-identifiers (S405). Note that step S405 is the same processing as step S201, and the appearance frequency of the attribute value of each quasi-identifier corresponds to the appearance frequency of the leaf node in the generalized hierarchy.
In step S405, the generalized hierarchy presenting unit 112 determines the generalized hierarchy of the selected semi-identifier included in the generalized hierarchy read request (the generalized hierarchy of the semi-identifier displayed in the generalized hierarchy editor 270). The appearance frequency of the attribute value of the upper node of the leaf node is obtained using the obtained appearance frequency of the leaf node (S406). For example, for the generalized hierarchy displayed in the generalized hierarchy editor 270 of the generalized hierarchy editing screen 200 of FIG. 2, the frequency 10 of the first layer node “French China” is the 0th layer node “French”. And the frequency 25 of the second layer node “French Central English” is equal to the frequency 10 of the first layer node “French Central English” and the frequency “English”. ”Is added to the frequency 15.

次に、損失指標算出部１１５が、一般化階層編集画面２００の一般化階層エディタ２７０に表示される一般化階層の各レイヤの情報損失率ＩＬを求める（Ｓ４０７）。ここで、一般化階層エディタ２７０には一般化階層読出要求に含まれる選択された準識別子の一般化階層が表示される。損失指標算出部１１５は、選択された準識別子以外の準識別子については匿名化実行ダイアログ２３１で指定されたレイヤ（すなわち、損失指標評価エリア２６０に表示されているレイヤ）に固定して、選択された準識別子のレイヤを変えながら各レイヤの情報損失率を求める。たとえば、図２の一般化階層エディタ２７０に表示されているケースでは、年齢と専攻を第１レイヤに固定したときの情報損失率を、国籍のレイヤを変えながらそれぞれ求める。このために、損失指標算出部１１５に、準識別子タプル頻度データ１５４、および匿名化後タプル頻度表３００を入力として与える。ここで、選択された準識別子については、一般化階層の全てのレイヤについて匿名化後タプル頻度表３００を求めて使用する。それ以外の準識別子については、損失指標評価エリア２６０に表示されているレイヤについて求められた匿名化後タプル頻度表３００を使用する。
そして、一般化階層提示部１１２は、一時記憶部１２０に保持している各ノードの出現頻度を含む一般化階層、および各レイヤの情報損失率を一般化階層編集受付装置１７０に送信する（Ｓ４０８）。一般化階層編集受付装置１７０は、受信した一般化階層と各レイヤの情報損失率を一般化階層エディタ２７０に表示する（Ｓ４０９）。 Next, the loss index calculation unit 115 obtains the information loss rate IL of each layer of the generalized hierarchy displayed on the generalized hierarchy editor 270 of the generalized hierarchy editing screen 200 (S407). Here, the generalized hierarchy editor 270 displays the generalized hierarchy of the selected quasi-identifier included in the generalized hierarchy read request. The loss index calculation unit 115 selects a quasi-identifier other than the selected quasi-identifier by fixing it to the layer designated in the anonymization execution dialog 231 (that is, the layer displayed in the loss index evaluation area 260). The information loss rate of each layer is obtained while changing the quasi-identifier layer. For example, in the case displayed in the generalized hierarchy editor 270 of FIG. 2, the information loss rate when the age and major are fixed to the first layer is obtained while changing the nationality layer. For this purpose, the quasi-identifier tuple frequency data 154 and the anonymized tuple frequency table 300 are given to the loss index calculation unit 115 as inputs. Here, for the selected quasi-identifier, the anonymized tuple frequency table 300 is obtained and used for all layers of the generalized hierarchy. For other quasi-identifiers, the anonymized tuple frequency table 300 obtained for the layer displayed in the loss index evaluation area 260 is used.
Then, the generalized hierarchy presentation unit 112 transmits the generalized hierarchy including the appearance frequency of each node held in the temporary storage unit 120 and the information loss rate of each layer to the generalized hierarchy editing reception apparatus 170 (S408). ). The generalized hierarchy editing reception apparatus 170 displays the received generalized hierarchy and the information loss rate of each layer on the generalized hierarchy editor 270 (S409).

（一般化階層編集処理）
図２２は、一般化階層編集処理の流れの一例を示す。
上述したように、一般化階層編集画面２００の一般化階層エディタ２７０上で「ノード名変更」、「ノード移動」、「レイヤ追加（上）」、「レイヤ追加（下）」および「レイヤ削除」等の各編集操作を行うことができる。
一般化階層編集受付装置１７０は、ノード移動等の編集操作を受け付けると、該当する編集操作の指定を含む編集要求を匿名化装置１０１に送信する（Ｓ５０１）。
匿名化装置１０１が編集要求を受信する（Ｓ５０２）と、一般化階層編集部１１３は、編集要求により指定される編集操作を一時記憶部１２０に保存されている一般化階層のデータ構造に対して実施し、一時記憶部１２０の一般化階層を更新する（Ｓ５０３）。
続いて、一般化階層編集部１１３等は、上述したステップＳ４０５〜Ｓ４０７と同一の処理を行って、一般化階層編集画面２００の一般化階層エディタ２７０に表示される一般化階層の各ノードの出現頻度および各レイヤの情報損失率ＩＬを再計算し、更に、上述したステップＳ４０８とＳ４０９と同一の処理を行って、一般化階層編集受付装置１７０上の一般化階層編集画面２００の一般化階層エディタ２７０に一般化階層を表示させる。
具体的には、一般化階層編集部１１３は、準識別子タプル頻度データ１５４を準識別子ごとに集計して、すべての準識別子について属性値の出現頻度（一般化階層におけるリーフノードの出現頻度）を求める（Ｓ５０４、Ｓ４０５と同一）。そして、一般化階層編集部１１３は、一般化階層エディタ２７０に表示されている準識別子の一般化階層について、ステップＳ５０４で求めたリーフノードの出現頻度を用いてリーフノードの上層のノードの属性値の出現頻度を求める（Ｓ５０５、Ｓ４０６と同一）。
次に、損失指標算出部１１５が、一般化階層編集画面２００の一般化階層エディタ２７０に表示されている一般化階層の各レイヤの情報損失率ＩＬを求める（Ｓ５０６、Ｓ４０７と同一）。
そして、一般化階層編集部１１３は、一時記憶部１２０に保持している各ノードの出現頻度を含む一般化階層、および各レイヤの情報損失率を一般化階層編集受付装置１７０に送信する（Ｓ５０７、Ｓ４０８と同一）。一般化階層編集受付装置１７０は、受信した一般化階層と各レイヤの情報損失率を一般化階層エディタ２７０に表示する（Ｓ５０８、Ｓ４０９と同一）。 (Generalized hierarchy editing process)
FIG. 22 shows an example of the flow of generalized hierarchy editing processing.
As described above, on the generalized hierarchy editor 270 of the generalized hierarchy editing screen 200, “node name change”, “node movement”, “layer addition (upper)”, “layer addition (lower)”, and “layer deletion”. Etc. can be performed.
When the general hierarchy editing accepting apparatus 170 accepts an editing operation such as node movement, the generalized hierarchy editing accepting apparatus 170 transmits an editing request including designation of the corresponding editing operation to the anonymization apparatus 101 (S501).
When the anonymization apparatus 101 receives the editing request (S502), the generalized hierarchy editing unit 113 performs the editing operation specified by the editing request on the data structure of the generalized hierarchy stored in the temporary storage unit 120. The generalized hierarchy of the temporary storage unit 120 is updated (S503).
Subsequently, the generalized hierarchy editing unit 113 and the like perform the same processing as steps S405 to S407 described above, and the appearance of each node of the generalized hierarchy displayed on the generalized hierarchy editor 270 of the generalized hierarchy editing screen 200. The generalized hierarchy editor of the generalized hierarchy editing screen 200 on the generalized hierarchy editing accepting apparatus 170 is obtained by recalculating the frequency and the information loss rate IL of each layer, and further performing the same processing as steps S408 and S409 described above. The generalized hierarchy is displayed in 270.
Specifically, the generalized hierarchy editing unit 113 aggregates the quasi-identifier tuple frequency data 154 for each quasi-identifier, and determines the appearance frequency of attribute values (the appearance frequency of leaf nodes in the generalized hierarchy) for all quasi-identifiers. Obtained (same as S504 and S405). Then, for the generalized hierarchy of the quasi-identifier displayed in the generalized hierarchy editor 270, the generalized hierarchy editing unit 113 uses the appearance frequency of the leaf node obtained in step S504 to determine the attribute value of the node above the leaf node. Is found (same as S505 and S406).
Next, the loss index calculation unit 115 obtains the information loss rate IL of each layer of the generalized hierarchy displayed on the generalized hierarchy editor 270 of the generalized hierarchy editing screen 200 (same as S506 and S407).
Then, the generalized hierarchy editing unit 113 transmits the generalized hierarchy including the appearance frequency of each node held in the temporary storage unit 120 and the information loss rate of each layer to the generalized hierarchy editing accepting apparatus 170 (S507). , Same as S408). The generalized hierarchy editing accepting apparatus 170 displays the received generalized hierarchy and the information loss rate of each layer on the generalized hierarchy editor 270 (same as S508 and S409).

（一般化階層保存処理）
図２３は、一般化階層保存部１１６における一般化階層保存処理の流れの一例を示す。
一般化階層編集受付装置１７０に表示されている一般化階層編集画面２００で「一般化階層−保存」メニュー２４０が選択されると、一般化階層編集受付装置１７０は、一般化階層エディタ２７０に表示されている準識別子の一般化階層を保存する保存要求を匿名化装置１０１に送信する（Ｓ６０１）。
匿名化装置１０１が保存要求を受信する（Ｓ６０２）と、一般化階層保存部１１６は、一時記憶部１２０に保存されている木構造の一般化階層から図１２の構造の一般化階層データ１５３を作成して、パーソナル情報管理装置１５０の一般化階層データ１５３に書き込む（Ｓ６０３）。 (Generalized hierarchy saving process)
FIG. 23 shows an example of the flow of generalized hierarchy storage processing in the generalized hierarchy storage unit 116.
When the “generalized hierarchy-save” menu 240 is selected on the generalized hierarchy editing screen 200 displayed on the generalized hierarchy editing accepting apparatus 170, the generalized hierarchy edit accepting apparatus 170 displays on the generalized hierarchy editor 270. A storage request for storing the generalized hierarchy of the quasi-identifier that has been sent is transmitted to the anonymization device 101 (S601).
When the anonymization apparatus 101 receives the save request (S602), the generalized hierarchy storage unit 116 obtains the generalized hierarchy data 153 having the structure of FIG. 12 from the generalized hierarchy of the tree structure stored in the temporary storage unit 120. Created and written in the generalized hierarchical data 153 of the personal information management device 150 (S603).

（匿名化処理）
図２４は、匿名化処理部１１７における匿名化処理の流れの一例を示す。
一般化階層編集受付装置１７０に表示されている一般化階層編集画面２００上で「一般化階層−保存」メニュー２４０が選択される（Ｓ７０１）と、図４の匿名化実行ダイアログ２３１が開く（Ｓ７０２）。匿名化実行ダイアログ２３１では、準識別子の一般化階層のレイヤを入力することができる。利用者は、損失指標評価エリア２６０を参照して情報損失率の評価を行い、最も適切であると判断したレイヤを匿名化実行ダイアログ２３１上で指定する。利用者が匿名化実行ダイアログ２３１上で各準識別子の一般化階層のレイヤを入力し、匿名化実行ボタンを押下する（Ｓ７０３）と、一般化階層編集受付装置１７０は各準識別子の一般化階層のレイヤを含む匿名化要求を匿名化装置１０１に送信する（Ｓ７０４）。
匿名化装置１０１が匿名化要求を受信する（Ｓ７０５）と、匿名化処理部１１７は、匿名化対象データに含まれる各準識別子の属性値が匿名化要求に含まれる各準識別子の一般化階層のレイヤに属するノードの属性値に置き換えられた匿名化データを作成する（Ｓ７０６）。
次に、匿名化処理部１１７は匿名化データに含まれる各準識別子の属性値の組み合わせの出現頻度を算出する（Ｓ７０７）。匿名化処理部１１７は、算出された出現頻度がｋ匿名化閾値未満のレコードを匿名化データから削除することにより、匿名化データに含まれる各準識別子の属性値の組み合わせの出現頻度が利用者によって指定されたｋ匿名化閾値を満たすように匿名化データを修正する（Ｓ７０８）。そして、匿名化処理部１１７は、匿名化後の各準識別子の属性値の組み合わせとその出現頻度とを含み、ｋ匿名化閾値を満たす匿名化データ１５５をパーソナル情報管理装置１５０に書き込む（Ｓ７０９）。なお、匿名化データ１５５の構成は、匿名化後タプル頻度表３００と同様である。
匿名化処理部１１７はこれらの一連の処理が完了すると匿名化処理の完了を一般化階層受付編集装置１７０に通知する（Ｓ７１０）。
なお、データファイルに含まれるパーソナル情報を匿名化するときには、まず、匿名化データ１５５に含まれる準識別子の属性値に対応する匿名化前の準識別子の属性値を一般化階層データ１５３から求める。そして、データファイルに含まれる匿名化前の準識別子の属性値を匿名化後の準識別子の属性値で置き換えることになる。 (Anonymization process)
FIG. 24 shows an example of the flow of anonymization processing in the anonymization processing unit 117.
When the “generalized hierarchy-save” menu 240 is selected on the generalized hierarchy edit screen 200 displayed on the generalized hierarchy edit accepting apparatus 170 (S701), the anonymization execution dialog 231 of FIG. 4 is opened (S702). ). In the anonymization execution dialog 231, the layer of the generalized hierarchy of the quasi-identifier can be input. The user refers to the loss index evaluation area 260 to evaluate the information loss rate, and designates the layer determined to be most appropriate on the anonymization execution dialog 231. When the user inputs the layer of the generalized hierarchy of each quasi-identifier on the anonymization execution dialog 231 and presses the anonymization execution button (S703), the generalized hierarchy editing reception apparatus 170 causes the generalized hierarchy of each quasi-identifier. The anonymization request including the layer is transmitted to the anonymization device 101 (S704).
When the anonymization device 101 receives the anonymization request (S705), the anonymization processing unit 117 includes the generalized hierarchy of each semi-identifier in which the attribute value of each semi-identifier included in the anonymization target data is included in the anonymization request. Anonymized data replaced with the attribute value of the node belonging to the layer is created (S706).
Next, the anonymization processing unit 117 calculates the appearance frequency of the combination of attribute values of each quasi-identifier included in the anonymized data (S707). The anonymization processing unit 117 deletes a record whose calculated appearance frequency is less than the k anonymization threshold from the anonymized data, so that the appearance frequency of the combination of attribute values of each quasi-identifier included in the anonymized data is determined by the user. The anonymization data is corrected so as to satisfy the k anonymization threshold specified by (S708). Then, the anonymization processing unit 117 writes the anonymization data 155 that includes the combination of the attribute values of each quasi-identifier after anonymization and the appearance frequency thereof and satisfies the k anonymization threshold to the personal information management device 150 (S709). . The configuration of the anonymization data 155 is the same as the anonymized tuple frequency table 300.
When these series of processes are completed, the anonymization processing unit 117 notifies the completion of the anonymization process to the generalized hierarchy reception editing apparatus 170 (S710).
When the personal information included in the data file is anonymized, first, the attribute value of the quasi-identifier before anonymization corresponding to the attribute value of the quasi-identifier included in the anonymized data 155 is obtained from the generalized hierarchy data 153. Then, the attribute value of the quasi-identifier before anonymization included in the data file is replaced with the attribute value of the quasi-identifier after anonymization.

以上説明したように、本発明によれば、匿名化対象データを基にした一般化階層のテンプレートを出発点として、一般化階層の編集を、損失情報量を確認しながらインタラクティブに行うことができる。このために、利用者は一般化階層の編集中に最適な匿名化プランが分かるようになる。このようにして編集した一般化階層を用いて、匿名化対象データの匿名化を利用者が望む匿名化プランによって実施することで、分析用途に適した一般化階層と匿名化データを、利用者が納得しやすい形で効率的に得ることができる。 As described above, according to the present invention, the generalized hierarchy can be edited interactively while confirming the amount of loss information, starting from the template of the generalized hierarchy based on the anonymization target data. . This allows the user to know the optimal anonymization plan while editing the generalized hierarchy. By using the generalized hierarchy edited in this way, the anonymization plan that the user wants to anonymize the data to be anonymized is implemented, so that the generalized hierarchy and anonymized data suitable for analysis use can be Can be efficiently obtained in a form that is easy to convince.

１００…パーソナル情報匿名化システム、１０１…匿名化装置、１１０…ユーザインタフェース部、１１１…一般化階層作成部、１１２…一般化階層提示部、１１３…一般化階層編集部、１１４…匿名化後頻度表算出部、１１５…損失指標算出部、１１６…一般化階層保存部、１１７…匿名化処理部、１２０…一時記憶部、１５０…パーソナル情報管理装置、１５１…匿名化対象データ、１５２…管理データ、１５３…一般化階層データ、１５４…準識別子タプル頻度データ、１５５…匿名化データ、１７０…一般化階層編集受付装置、１８０…ネットワーク、２００…一般化階層編集画面、２１０…メニューバー、２２０…「データ−取込」メニュー、２２１…匿名化対象データ取込ダイアログ、２３０…「データ−匿名化」メニュー、２３１…匿名化実行ダイアログ、２４０…「一般階層化−保存」メニュー、２５０…匿名化設定表示エリア、２６０…損失指標評価エリア、２７０…一般化階層エディタ、３００…匿名化後タプル頻度表 DESCRIPTION OF SYMBOLS 100 ... Personal information anonymization system, 101 ... Anonymization apparatus, 110 ... User interface part, 111 ... Generalization hierarchy preparation part, 112 ... Generalization hierarchy presentation part, 113 ... Generalization hierarchy editing part, 114 ... Frequency after anonymization Table calculation unit 115 ... Loss index calculation unit 116 ... Generalized hierarchy storage unit 117 117 Anonymization processing unit 120 ... Temporary storage unit 150 ... Personal information management device 151 ... Anonymization target data 152 ... Management data 153 ... Generalized hierarchy data, 154 ... Semi-identifier tuple frequency data, 155 ... Anonymized data, 170 ... Generalized hierarchy edit accepting device, 180 ... Network, 200 ... Generalized hierarchy edit screen, 210 ... Menu bar, 220 ... “Data-acquisition” menu, 221... Anonymization object data import dialog, 230... “Data-anonymization” menu, 2 1 ... anonymous Run dialog, 240 ... "General hierarchical - Save" menu, 250 ... anonymous setting display area, 260 ... loss index evaluation area, 270 ... generalization hierarchy editor, tuple frequency table after 300 ... anonymous

Claims

Based on the anonymization target data including a combination of attribute values of a plurality of quasi-identifiers to be anonymized, for each quasi-identifier, the attribute value included in the anonymization target data and the appearance frequency of the attribute value A lowermost layer including a plurality of nodes, and a layer including one or more nodes having an attribute value having the same or higher degree of anonymization as compared to the lower layer and an appearance frequency of the attribute value. A generalized hierarchy creating means for creating a generalized hierarchy of a tree structure,
In response to the designation of the quasi-identifier by the user, a generalized hierarchy presentation means for presenting the generalized hierarchy of the designated quasi-identifier to the user;
Generalized hierarchy editing in which the generalized hierarchy presented by the generalized hierarchy presenting means is updated while maintaining a tree structure in accordance with an editing instruction by the user, and the updated generalized hierarchy is re-presented to the user. Means,
In response to designation of the generalized hierarchy layer of each semi-identifier by the user, the attribute value of each semi-identifier included in the anonymization target data belongs to the general hierarchy layer of the designated semi-identifier Anonymization means for creating anonymized data replaced with attribute values of
A personal information anonymization system comprising:

A loss index calculating means for determining an information loss index of each layer of the generalized hierarchy;
The generalized hierarchy presenting means and the generalized hierarchy editing means present the information loss index of each layer included in the generalized hierarchy obtained by the loss index calculating means to the user;
The personal information anonymization system according to claim 1.

The anonymization means modifies the anonymization data so that an appearance frequency of a combination of attribute values of each quasi-identifier included in the anonymization data satisfies a k anonymization threshold specified by a user. The personal information anonymization system according to claim 1 or 2.

4. The editing according to claim 1, wherein the editing in the generalized hierarchy editing means includes a node name change, a node movement, a layer addition, and a layer deletion. Personal information anonymization system.