WO2026009409A1

WO2026009409A1 - Secure computation system and record extraction method

Info

Publication number: WO2026009409A1
Application number: PCT/JP2024/024395
Authority: WO
Inventors: 哲士田中; 亮菊池
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2024-07-05
Filing date: 2024-07-05
Publication date: 2026-01-08
Anticipated expiration: 2027-01-05

Abstract

This secure computation system is composed of one or more secure computation servers. Each secure computation server comprises a random number assignment unit, a combination unit, a sorting unit, a comparison unit, a shuffling unit, and a deletion unit. The random number assignment unit associates a share [r_n] obtained by secretly sharing a random number r_n with each record. The combination unit finds a share [c_n∥r_n] of a combination c_n∥r_n of a class c_n in the record and a random number r_n associated with the record. The sorting unit finds a sorted share [c_n∥r_n] obtained by sorting the combination c_n∥r_n. The comparison unit compares the sorted class c_n with a class c_n-k, and associates the share of a comparison result with the share of the record. The deletion unit deletes the share of the record with which the comparison result indicating a match is associated.

Description

Secure computing system, record extraction method

　本発明は、複数の秘密計算サーバで構成される秘密計算システムとレコード抽出方法に関する。 The present invention relates to a secure computation system consisting of multiple secure computation servers and a record extraction method.

　ＡＩ等でのデータ利活用の機運が高まる一方、プライバシー保護の要求も高まっており、医療分野等で機密性の高いデータを実績のあるＡＩアルゴリズムで分析できる環境が求められている。秘密計算ＡＩは、複数企業の蓄積するデータを持ち寄って、安全に利活用するための技術である。データを秘匿化（秘密分散）したまま一度も復元することなく、極めて安全で多様なＡＩアルゴリズムによるモデルの作成、推論（計算結果以外は誰にも見えないデータ運用）が可能となる。秘密計算ＡＩは、データを秘匿したまま利用可能な統計分析・機械学習サービスを実現できるが、データを見ずに分析等処理を実施するため、平文と比較して非常に効率が悪く、これまで現実的な利用条件（許容範囲内の計算時間）で動作するシステムが実装された例はない。 While momentum for data utilization using AI and other technologies is growing, so too is the demand for privacy protection, creating a need for an environment where highly confidential data can be analyzed using proven AI algorithms in fields such as healthcare. Secure computing AI is a technology that brings together data accumulated by multiple companies and makes safe use of it. It enables the creation of models and inference (data manipulation in which no one can see anything but the calculation results) using extremely secure and diverse AI algorithms, while keeping the data confidential (secret distribution) and never restoring it. Secure computing AI can realize statistical analysis and machine learning services that can be used while keeping the data confidential, but because analysis and other processing is performed without seeing the data, it is extremely inefficient compared to plain text, and to date, there have been no examples of a system being implemented that operates under realistic usage conditions (acceptable calculation time).

　秘密計算ＡＩを秘密計算そのままの形態で実装するのは難しい。つまりモデルの種別、パラメータの深さ、構造情報他、完全な秘密計算でやるためには、すべての学習を抽象化することで長時間に渡る計算が必要となり、ユーザビリティを損ねるため非現実的である。このため実用的な計算速度で秘密計算できる技術が求められている。許容可能な時間内に実行するため、学習に用いるレコードを抽出・選別して効率的な探索を行う必要がある。例えば、レコード数の少ないクラスのデータを漏れなく抽出する方法として層化抽出法（非特許文献１，２）がある。 It is difficult to implement secure computation AI in the form of secure computation itself. In other words, to perform fully secure computation, taking into account model type, parameter depth, structural information, etc., abstracting all learning would require lengthy calculations, which would impair usability and make it unrealistic. For this reason, there is a demand for technology that can perform secure computation at a practical calculation speed. To execute within an acceptable timeframe, it is necessary to extract and select records to be used for learning and perform an efficient search. For example, stratified sampling (Non-Patent Documents 1 and 2) is a method for thoroughly extracting data from classes with a small number of records.

吉澤寛之，吉田俊和，中島誠，吉田琢哉，原田知佳，“地域住民の関与・雰囲気が集合的有能感を介して子どもの反社会性に及ぼす影響　―層化抽出法を用いたマルチレベル分析による検討―（Effects of Neighborhood Commitment and Atmosphere on Youth Antisociality Mediated by Collective Efficacy: A Multilevel Study with Stratified Sampling）”，応用心理学研究, Vol.45, No.1, pp.35-46, 2019.Yoshizawa Hiroyuki, Yoshida Toshikazu, Nakajima Makoto, Yoshida Takuya, Harada Chika, "Effects of Neighborhood Commitment and Atmosphere on Youth Antisociality Mediated by Collective Efficacy: A Multilevel Study with Stratified Sampling," Applied Psychology Research, Vol. 45, No. 1, pp. 35-46, 2019. Sechidis, K., Tsoumakas, G., Vlahavas, I. "On the Stratification of Multi-label Data", In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg，［令和６年６月１３日検索］、インターネット<https://doi.org/10.1007/978-3-642-23808-6_10>.Sechidis, K., Tsoumakas, G., Vlahavas, I. "On the Stratification of Multi-label Data", In: Gunopoulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol. 6913. Springer, Berlin, Heidelberg, [Retrieved June 13, 2024], Internet <https://doi.org/10.1007/978-3-642-23808-6_10>.

　しかしながら、非特許文献１，２に示されたようなレコード数の少ないクラスのデータを漏れなく抽出する方法を、秘密計算ＡＩに単純に適用することはできない。そこで、本発明は、秘密計算を用いてレコード数の少ないクラスのデータを漏れなく抽出する方法を提供することを目的とする。 However, the methods for extracting data from classes with a small number of records without omission, as shown in Non-Patent Documents 1 and 2, cannot be simply applied to secure computing AI. Therefore, the present invention aims to provide a method for extracting data from classes with a small number of records without omission, using secure computing.

　本発明の秘密計算システムは、１台以上の秘密計算サーバで構成される。各秘密計算サーバは、記録部、乱数付与部、結合部、ソート部、比較部、シャッフル部、削除部を備える。乱数付与部は、乱数ｒ_ｎを秘密分散したシェア［ｒ_ｎ］をレコードごとに対応付ける。結合部は、レコード内のクラスｃ_ｎとレコードと対応付けられた乱数ｒ_ｎとの結合ｃ_ｎ∥ｒ_ｎのシェア［ｃ_ｎ∥ｒ_ｎ］を求める。ソート部は、結合ｃ_ｎ∥ｒ_ｎをソートしたソート後シェア［ｃ_ｎ∥ｒ_ｎ］を求める。比較部は、ソート後のクラスｃ_ｎとクラスｃ_ｎ－ｋを比較し、比較結果のシェアをレコードのシェアに対応付ける。シャッフル部は、比較結果をシャッフルし、シャッフル後の比較結果のシェアを求める。削除部は、一致を意味する比較結果が対応付けられているレコードのシェアを削除する。秘密計算サーバのいずれか、もしくは当該秘密計算システムに接続された秘密計算クライアント装置が、シャッフル後の比較結果のシェアを用いてシャッフル後の比較結果を復元する復元部を備える。 The secure computation system of the present invention is composed of one or more secure computation servers. Each secure computation server includes a recording unit, a random number assignment unit, a combining unit, a sorting unit, a comparison unit, a shuffling unit, and a deleting unit. The random number assignment unit associates a share [r _n ] obtained by secretly sharing a random number r _n with each record. The combining unit obtains a share [c n ∥r _n ] of the combination c _n ∥r _n between a class c _n in the record and the random number r _n associated with the record. The sorting unit obtains a post-sorted share [ _c _n ∥r _n ] by sorting the combination c _n ∥r _n . The comparison unit compares the sorted class c _n with the class c _n-k and associates the share of the comparison result with the share of the record. The shuffling unit shuffles the comparison result and obtains the share of the shuffled comparison result. The deleting unit deletes the share of the record associated with a comparison result indicating a match. Any of the secure computation servers or a secure computation client device connected to the secure computation system includes a restoration unit that restores the post-shuffle comparison result using the share of the post-shuffle comparison result.

　本発明の秘密計算システムによれば、指定した属性のクラス比率が均等になるようにレコードを抽出するので、秘密計算を用いてレコード数の少ないクラスのデータを漏れなく抽出する方法を提供できる。 The secure computation system of the present invention extracts records so that the class ratio of the specified attribute is equal, providing a method for using secure computation to thoroughly extract data from classes with a small number of records.

ネットワークを介して接続された複数の秘密計算サーバで構成される秘密計算システムと秘密計算クライアント装置の構成例を示す図。FIG. 1 is a diagram showing an example of the configuration of a secure computation system made up of multiple secure computation servers connected via a network and a secure computation client device. 秘密計算サーバの機能構成例を示す図。FIG. 2 is a diagram showing an example of the functional configuration of a secure computation server. 実施例１のレコード抽出方法の処理フローを示す図。FIG. 4 is a diagram showing the processing flow of the record extraction method according to the first embodiment. レコードの抽出方法のイメージを説明するための図。FIG. 10 is a diagram for explaining an image of a record extraction method. レコードの例を示す図。FIG. 10 is a diagram showing an example of a record. レコードごとに乱数を生成し属性Ｙのデータに乱数を結合した例を示す図。FIG. 10 is a diagram showing an example in which a random number is generated for each record and is linked to data of attribute Y. 結合ｃ_ｎ∥ｒ_ｎに基づいて、レコードをソートした結果を示す図。FIG. 10 shows the results of sorting records based on the join c _n ∥r _n . ソート後にｋ個前のデータと比較した結果を示す図。FIG. 10 is a diagram showing the results of comparison with k previous data after sorting. シャッフルは行っていない状態で比較結果が”１”のレコードを削除した例を示す図。FIG. 10 is a diagram showing an example in which records with a comparison result of "1" are deleted without shuffling. 変形例１のレコード抽出方法の処理フローを示す図。FIG. 10 is a diagram showing the processing flow of a record extraction method according to Modification 1. コンピュータの機能構成例を示す図。FIG. 2 is a diagram showing an example of the functional configuration of a computer.

　実施形態の説明に先立ち、本開示で利用する基本的な技術概念を説明する。 Before explaining the embodiments, we will explain the basic technical concepts used in this disclosure.

［秘密分散技術］
　秘密分散とは、データを複数の分散値に変換し、一定個数以上の分散値を用いれば元のデータを復元でき、一定個数未満の分散値からは元のデータを一切復元できなくする技術である。秘密分散の一種である(ｋ,ｎ)-秘密分散は、入力された平文をｎ個に分割した分散値をｎ個の計算主体に分散しておき、任意のｋ個の分散値が揃えば平文を復元でき、ｋ個未満の分散値からは平文に関する一切の情報を得られないような秘密分散である。このとき、ｎ,ｋは1以上の整数であり、ｎ≧ｋである。(ｋ,ｎ)-秘密分散の代表的な例は、「A.Shamir, "How to share a secret", Communications of the ACM, Volume 22 Issue 11, pp. 612-613, 1979.（参考文献１）」に記載されている、Shamir秘密分散である。本開示で利用する秘密分散は、後述の秘密計算が利用可能な方法であればどのようなものであってもよい。 [Secret sharing technology]
Secret sharing is a technique in which data is converted into multiple shares, allowing the original data to be restored using a certain number of shares or more, but making it impossible to restore the original data using shares less than the certain number. (k,n)-secret sharing, a type of secret sharing, divides an input plaintext into n shares, distributes the shares to n computing entities, and allows the plaintext to be restored using any k shares, but no information about the plaintext can be obtained using shares less than k. Here, n and k are integers greater than or equal to 1, and n≧k. A typical example of (k,n)-secret sharing is Shamir secret sharing, described in A. Shamir, "How to share a secret," Communications of the ACM, Volume 22, Issue 11, pp. 612-613, 1979 (Reference 1). The secret sharing used in this disclosure may be any method that can utilize secure computation, as described below.

［秘密計算技術］
　秘密計算は、複数の計算主体に計算対象のデータを秘密分散して保存しておき、元のデータを復元することなく他の計算主体と協力して元のデータの関数値の分散値を計算する技術である。秘密計算では要素技術として秘密分散を利用する。 [Secret computing technology]
Secure computation is a technology in which data to be computed is secretly shared and stored among multiple computing entities, and the computing entities cooperate to calculate the shared values of functions of the original data without recovering the original data.Secure computation uses secret sharing as an elemental technology.

　本開示の第１実施形態で利用する秘密計算は、所望のデータ処理に必要な各種演算が特定の秘密分散方法による分散値に対して可能なものを適宜利用すればよい。秘密分散値に対して加算や乗算等の基本演算を行う秘密計算技術は、例えば、「千田浩司、濱田浩気、五十嵐大、高橋克己、"軽量検証可能３パーティ秘匿関数計算の再考"、コンピュータセキュリティシンポジウム2010、2010年（参考文献２）」に記載されている。データ列の秘密分散値から情報を秘匿したまま検索を行う秘密マッチング技術は、例えば、「千田浩司、寺田雅之、山口高康、五十嵐大、濱田浩気、高橋克巳、"統計的開示制御を考慮したセキュアマッチングプロトコル"、情報処理学会研究報告、2011-CSEC-52(12)、2011年（参考文献３）」に記載されている。データ列の秘密分散値を秘匿したまま整列する秘密ソート技術は、例えば、「濱田浩気、五十嵐大、千田浩司、高橋克巳、"秘匿関数計算上の線形時間ソート"、コンピュータセキュリティシンポジウム2011、2011年（参考文献４）」に記載されている。 The secure computation used in the first embodiment of the present disclosure may be one that allows the various operations required for the desired data processing to be performed on shared values obtained by a specific secret sharing method. A secure computation technique that performs basic operations such as addition and multiplication on secret shared values is described, for example, in "Koji Senda, Hiroki Hamada, Dai Igarashi, Katsumi Takahashi, "Rethinking Lightly Verifiable Three-Party Secure Function Computation," Computer Security Symposium 2010, 2010 (Reference 2)." A secret matching technique that searches for information from the secret shared values of a data string while keeping it secret is described, for example, in "Koji Senda, Masayuki Terada, Takayasu Yamaguchi, Dai Igarashi, Hiroki Hamada, Katsumi Takahashi, "Secure Matching Protocol Considering Statistical Disclosure Control," Information Processing Society of Japan Research Report, 2011-CSEC-52(12), 2011 (Reference 3)." A secret sorting technique that sorts data strings while keeping their secret sharing values secret is described, for example, in "Hamada Hiroki, Igarashi Dai, Senda Koji, Takahashi Katsumi, "Linear Time Sorting on Secure Function Computation," Computer Security Symposium 2011, 2011 (Reference 4)."

　以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 The following describes in detail an embodiment of the present invention. Components with the same functions are assigned the same numbers, and duplicate explanations will be omitted.

First embodiment

　以下の説明では、Ｎは機械学習に用いるためのレコードの数、ｎは１以上Ｎ以下の整数、ｋはあらかじめ定められた正の整数とする。図１に、ネットワークを介して接続された複数の秘密計算サーバで構成される秘密計算システムと秘密計算クライアント装置の構成例を示す。図２に、秘密計算サーバの機能構成例を示す。図３に、実施例１のレコード抽出方法の処理フローを示す。図４はレコードの抽出方法のイメージを説明するための図である。 In the following explanation, N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer. Figure 1 shows an example configuration of a secure computation system and a secure computation client device, which are composed of multiple secure computation servers connected via a network. Figure 2 shows an example functional configuration of a secure computation server. Figure 3 shows the processing flow of the record extraction method of Example 1. Figure 4 is a diagram for explaining an image of the record extraction method.

　秘密計算システム１０はＸ台の秘密計算サーバ１００－１,...,Ｘで構成される。Ｘは例えば３以上とすればよいが、後述の秘密計算が利用可能な方法であれば１以上のいずれの数でもよい。秘密計算システム１０には、既存の技術を用いればよい（WO2012/046692（ファミリーの米国特許US8,989,391B2）に示された技術など、既存技術がある（参考文献５））。例えば、Ｘ＝３の場合、１つのデータは３つのシェアに変換され、３台の秘密計算サーバ１００－１，２，３は１つずつシェアを記録する。データを復元するためには、２つのシェアが必要である。秘密計算システム１０では、データを秘密分散した状態（秘匿化した状態）を維持したまま、計算を行うことができ、結果も結果のシェアとして秘匿化した状態で各秘密計算サーバ１００－ｘに記録される。結果が統計量であれば、２つ以上のシェアを取得し、復元することで統計量を知ることはできるが、元のデータの秘匿性は維持できる。 The secure computation system 10 is composed of X secure computation servers 100-1, ..., X. X may be, for example, 3 or greater, but can be any number greater than 1 as long as the method allows for secure computation, as described below. Existing technology can be used for the secure computation system 10 (such as the technology disclosed in WO2012/046692 (family U.S. Patent US8,989,391B2) (Reference 5)). For example, when X = 3, one piece of data is converted into three shares, and the three secure computation servers 100-1, 100-2, and 100-3 each record one share. Two shares are required to restore the data. In the secure computation system 10, calculations can be performed while maintaining the data in a secretly shared state (anonymized state), and the results are also recorded anonymously as shares of the results on each secure computation server 100-x. If the results are statistics, the statistics can be known by obtaining and restoring two or more shares, while maintaining the confidentiality of the original data.

　秘密計算クライアント装置２００は、秘密計算クライアント部２１０、ＡｕｔｏＭＬ部２２０、通信部２８０、記録部２９０を備える。通信部２８０は、ユーザ装置３００と接続してもよい。この場合、通信部２８０とユーザ装置３００との間の通信には暗号技術を用いればよい。記録部２９０には、ＡＩモデルの学習に使用されるレコードが記録される。それぞれのレコードは、複数の属性のデータで構成される。属性とは、例えば、年齢、性別、基礎的な医療情報（血液中の成分の値など）、病名などである。 The secure computation client device 200 comprises a secure computation client unit 210, an AutoML unit 220, a communication unit 280, and a recording unit 290. The communication unit 280 may be connected to the user device 300. In this case, encryption technology may be used for communication between the communication unit 280 and the user device 300. The recording unit 290 records records used for training the AI model. Each record consists of data with multiple attributes. Attributes include, for example, age, gender, basic medical information (such as the values of blood components), and disease name.

　秘密計算クライアント部２１０は、複数の秘密計算サーバ１００－１,...,Ｘで構成される秘密計算システムと接続され、秘密分散部２１１と復元部２１２を有する。秘密計算クライアント部２１０は、秘密計算システム１０へのレコードのシェア、指示情報のシェアの送信と、秘密計算システム１０からの計算結果のシェア、評価値のシェアの受信を行う。ここで、「秘密計算システム１０へのシェアの送信」とは、各秘密計算サーバ１００－ｘに対応するシェアをそれぞれ送信することを意味し、「秘密計算システム１０からのシェアの受信」とは、各秘密計算サーバ１００－ｘからそれぞれが記録しているシェアを受信することを意味する。つまり、秘密分散部２１１は、レコード、指示情報などを複数のシェアに変換し、各秘密計算サーバ１００－ｘに対応するシェアを送信する。指示情報とは、どの属性を用いてＡＩモデルの機械学習を行うのかなどの情報である。復元部２１２は、秘密計算システム１０での計算結果のシェア、評価値のシェアを受信し、平文の計算結果、評価値を取得する。 The secure computation client unit 210 is connected to a secure computation system consisting of multiple secure computation servers 100-1,...,X, and has a secret sharing unit 211 and a restoration unit 212. The secure computation client unit 210 transmits shares of records and shares of instruction information to the secure computation system 10, and receives shares of calculation results and shares of evaluation values from the secure computation system 10. Here, "transmitting shares to the secure computation system 10" means transmitting shares corresponding to each secure computation server 100-x, and "receiving shares from the secure computation system 10" means receiving shares recorded by each secure computation server 100-x. In other words, the secret sharing unit 211 converts records, instruction information, etc. into multiple shares and transmits shares corresponding to each secure computation server 100-x. The instruction information is information such as which attributes to use for machine learning of the AI model. The restoration unit 212 receives shares of calculation results and shares of evaluation values from the secure computation system 10, and acquires plaintext calculation results and evaluation values.

　秘密分散部２１１は、あらかじめＡＩモデルの機械学習に使用するレコードを複数のシェアに変換し、各秘密計算サーバ１００－ｘに対応するシェアを送信する。各秘密計算サーバ１００－ｘは、レコードを構成する属性のデータのシェアを記録部１９０に記録している。 The secret sharing unit 211 converts the records used for machine learning of the AI model into multiple shares in advance, and transmits the corresponding shares to each secure computation server 100-x. Each secure computation server 100-x records the shares of attribute data that make up the record in the recording unit 190.

　図２に示すように、秘密計算サーバ１００－ｘは、記録部１９０、乱数付与部１１０、結合部１２０、ソート部１３０、比較部１４０、シャッフル部１５０、削除部１６０、復元部１８０を備える。本発明では、レコードの集団の中から、指定した属性が同じクラスのレコードが均等になるようにレコードを抽出する。図４の集団には、４種類の丸があるが、その数には偏りがある。集団からランダムに丸を抽出すると集団の中に多く存在する丸を多く抽出することになり、数が少ない丸の数は少なくなる。一方、４種類の丸を当配分に抽出すれば、多く存在する丸も少ない丸も均等に抽出できる。 As shown in Figure 2, the secure computation server 100-x comprises a recording unit 190, a random number assignment unit 110, a combining unit 120, a sorting unit 130, a comparison unit 140, a shuffling unit 150, a deletion unit 160, and a restoration unit 180. In the present invention, records are extracted from a collection of records so that records of the same class with a specified attribute are evenly distributed. The collection in Figure 4 contains four types of circles, but there is a bias in their numbers. If circles are extracted randomly from the collection, more circles that are present in large numbers in the collection will be extracted, and the number of circles that are rare will decrease. On the other hand, if the four types of circles are extracted in an equal distribution, it is possible to extract both abundant and rare circles evenly.

　例えば、医療診断にＡＩを利用する場合を考えてみる。健康診断のデータの場合、最も多いのは精密検査不要という結果であり、精密検査を行う必要がある場合も、多い病気と少ない病気がある。有効な機械学習をするためには、レコードを抽出する際に、症例が少ない病気のレコード（データ）を削除しないようにする必要がある。また、健康診断の結果に基づく精密検査の必要性の一次判断にＡＩを利用する場合、精密検査が必要かもしれないという結果が多めに出力される方が望ましい。そこで、本発明では、秘密計算を用いて指定した属性のクラス比率が均等になるようにレコードを抽出することで、レコード数の少ないクラスのデータを漏れなく抽出する。 For example, consider the use of AI in medical diagnosis. In the case of health checkup data, the most common result is that no further examination is necessary, and even when further examination is necessary, there are diseases that are common and diseases that are rare. In order to perform effective machine learning, it is necessary to avoid deleting records (data) for diseases with few cases when extracting records. Furthermore, when using AI to make an initial judgment on the need for further examination based on the results of a health checkup, it is desirable to output more results indicating that further examination may be necessary. Therefore, in this invention, secret computation is used to extract records so that the class ratios of specified attributes are equal, thereby extracting data from classes with few records without omission.

　図５は、レコードの例である。２０個のレコードがあり、各レコードは属性Ｙのデータを有している。属性Ｙのクラスには、”０”，”１”，”２”，”３”，”４”の５種類がある。秘密計算サーバ１００－１,...,Ｘの乱数付与部１１０は、協調して乱数ｒ_ｎを生成し、乱数ｒ_ｎを秘密分散したシェア［ｒ_ｎ］をレコードごとに対応付ける（Ｓ１１０）。「協調」に関しては、既存の秘密計算技術（WO2012/046692（ファミリーの米国特許US8,989,391B2）に示された技術（参考文献５））と同じであり、既存の技術を用いればよい。秘密計算サーバ１００－１,...,Ｘの結合部１２０は、協調してレコード内のクラスｃ_ｎとレコードと対応付けられた乱数ｒ_ｎとの結合ｃ_ｎ∥ｒ_ｎのシェア［ｃ_ｎ∥ｒ_ｎ］を求める（Ｓ１２０）。図６は、レコードごとに乱数を生成し属性Ｙのデータに乱数を結合した例である。 FIG. 5 shows an example of records. There are 20 records, and each record has data of attribute Y. There are five classes of attribute Y: "0", "1", "2", "3", and "4". The random number assigning units 110 of the secure computation servers 100-1, ..., X cooperatively generate random numbers r _n and associate the shares [r _n ] obtained by secretly sharing the random numbers r _n with each record (S110). The "cooperation" is the same as the existing secure computation technology (the technology disclosed in WO2012/046692 (US Patent US8,989,391B2 of the same family) (Reference 5)), so any existing technology may be used. The combining units 120 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [c _n ∥r _n ] of the combination c _n ∥r _n between the class c _n in the record and the random number r _n associated with the record (S120). FIG. 6 shows an example in which a random number is generated for each record and linked to the data of attribute Y.

　秘密計算サーバ１００－１,...,Ｘのソート部１３０は、協調して結合ｃ_ｎ∥ｒ_ｎをソートしたソート後シェア［ｃ_ｎ∥ｒ_ｎ］を求める（Ｓ１３０）。図７は結合ｃ_ｎ∥ｒ_ｎに基づいて、レコードをソートした結果を示している。つまり、結合ｃ_ｎ∥ｒ_ｎのソートではデータｃ_ｎも一緒にソートする。 The sorting units 130 of the secure computation servers 100-1,...,X cooperate to sort the join _cn∥r _n and obtain the post-sort share [ _cn∥r _n ] (S130). Fig. 7 shows the results of sorting records based on the _join cn∥r _n . In other words, when sorting the join _cn∥r _n , the data _cn is also sorted.

　秘密計算サーバ１００－１,...,Ｘの比較部１４０は、協調してソート後のクラスｃ_ｎとクラスｃ_ｎ－ｋを比較し、比較結果のシェアをレコードのシェアに対応付ける（Ｓ１４０）。ｋは、秘密計算クライアント装置２００が秘密計算システム１０に指定した整数であり、同じクラスのレコードを何個残すかを示している。図８はソート後にｋ個前のデータと比較した結果を示す図である。ソート後のｋ個前のデータと異なる場合は比較結果として”０”が出力され、同じ場合に比較結果として”１”が出力される。図８の例ではｋ＝２である。 The comparison units 140 of the secure computation servers 100-1,...,X cooperate to compare the sorted class c _n with the class c _n-k , and associate the share of the comparison result with the share of the record (S140). k is an integer specified by the secure computation client device 200 to the secure computation system 10, and indicates how many records of the same class are to be left. FIG. 8 is a diagram showing the result of a comparison with the k-th data after sorting. If the sorted data is different from the k-th data before it, "0" is output as the comparison result, and if the data is the same, "1" is output as the comparison result. In the example of FIG. 8, k=2.

　秘密計算サーバ１００－１,...,Ｘのシャッフル部１５０は、協調して比較結果をシャッフルし、シャッフル後の比較結果のシェアを求める（Ｓ１５０）。いずれかの秘密計算サーバ１００－ｘの復元部１８０、もしくは当該秘密計算システム１０に接続された秘密計算クライアント装置２００の復元部２１２が、シャッフル後の比較結果のシェアを用いてシャッフル後の比較結果を復元する（Ｓ２１２）。復元するのは、比較結果だけであり属性Ｙのデータ自体は復元しない。図３の例は秘密計算クライアント装置２００の復元部２１２が復元した例である。 The shuffling units 150 of the secure computation servers 100-1, ..., X cooperatively shuffle the comparison results and determine the share of the comparison results after shuffling (S150). The restoration unit 180 of one of the secure computation servers 100-x, or the restoration unit 212 of the secure computation client device 200 connected to the secure computation system 10, restores the comparison results after shuffling using the share of the comparison results after shuffling (S212). Only the comparison results are restored; the data of attribute Y itself is not restored. The example in Figure 3 is an example restored by the restoration unit 212 of the secure computation client device 200.

　その後、秘密計算サーバ１００－１,...,Ｘの削除部１６０は、協調して一致を意味する比較結果が対応付けられているレコードのシェアを削除する（Ｓ１６０）。図９は、シャッフルは行っていない状態で比較結果が”１”のレコードを削除した例を示している。図８の例ではｋ＝２であり、クラスの種類が５個だったので、図９では１０個のレコードが残っている。 Then, the deletion units 160 of the secure computation servers 100-1, ..., X cooperatively delete the shares of records associated with comparison results indicating a match (S160). Figure 9 shows an example in which records with a comparison result of "1" are deleted without shuffling. In the example of Figure 8, k = 2 and there were five types of classes, so in Figure 9, 10 records remain.

　属性Ｙのデータは秘密分散されて秘密計算サーバ１００－１,...,Ｘが記録しており、秘密計算でレコードを抽出する。よって、データの秘匿性は維持した上で属性のクラス比率が均等になるようにレコードを抽出できる。また、等配分法は指定した属性のクラス比率を均等にすることで属性の偏りによるＡＩ学習への影響をなくせる。その実現により、レコード数の少ないクラスのデータを漏れなく抽出しつつ、ＡＩ分析をより高速に処理できる。 The data for attribute Y is secretly distributed and recorded on secure computation servers 100-1,...,X, and records are extracted using secure computation. This allows records to be extracted so that the attribute class ratio is equal while maintaining the confidentiality of the data. Furthermore, the equal allocation method equalizes the class ratio of specified attributes, eliminating the impact of attribute bias on AI learning. This allows for faster processing of AI analysis while ensuring that data from classes with a small number of records is extracted without omission.

　上述のように、本発明は、２つのレコードを比較して重複排除するので、秘密計算ＡＩで等配分法を実現できるため、学習するレコード数を削減し、ＡＩ分析をより高速に処理できる。層化抽出法は、直前でなく抽出したい個数ｋの幅を確保して比較することで任意の個数を抽出できる。層化抽出法の一つである等配分法は、指定した属性のクラス比率を均等にすることで属性の偏りによる影響をなくせる。等配分法では現在のレコードがｋ個前のレコードと同じクラスか、つまり同じクラスがk個連続しているかのみを判定するため、レコード数の少ないクラスのデータを漏れなく抽出できる。判定するのは「同じクラス」か「違うクラス」かだけで、違うクラスの中身が何番目のクラスであるかは判定不要で、また、処理を変える必要もないため、高速処理が実現される。 As described above, this invention compares two records and eliminates duplicates, making it possible to implement the equal distribution method with secure computing AI, thereby reducing the number of records to be learned and enabling faster AI analysis. Stratified sampling can extract any number of records by ensuring a range of the number k you want to extract and comparing them instead of just before. Equal distribution, one of the stratified sampling methods, can eliminate the impact of attribute bias by equalizing the class ratio of a specified attribute. Equal distribution only determines whether the current record is of the same class as the k records before it, in other words, whether there are k consecutive records of the same class, so it can extract data from classes with a small number of records without omitting anything. The only thing that needs to be determined is whether they are "same class" or "different class," and there is no need to determine which class the contents of the different class are, and there is no need to change the processing, allowing for high-speed processing.

　秘密計算ＡＩのレコード抽出法として等分配法を実現し、処理が高速化されるため、これまで実現できなかった秘密計算ＡＩを現実的な条件（許容範囲内の計算時間）で実装し、全体としてデータを秘匿できる。単純なランダムサンプリングを使ったレコード抽出による効率化だとレコード数の少ないクラスのデータが抽出されない恐れがある。しかし、等分配法は、レコード数がｋ個以下の場合は単純にすべてのレコードが削除されず抽出できる。複数企業の蓄積するデータを持ち寄って、安全にデータを利活用でき、また、秘密計算ＡＩの適用性拡大（ＡＩによるデータ分析等）、導入容易性向上（事業アセット連携等）を図り、幅広いユーザへの展開が可能となる。
［変形例１］ By implementing the equipartition method as a record extraction method for secure computation AI and speeding up processing, it is now possible to implement secure computation AI under realistic conditions (within acceptable computation time), which was previously unattainable, and keep data confidential as a whole. Improving efficiency by using simple random sampling to extract records risks not extracting data from classes with a small number of records. However, the equipartition method allows for extraction without simply deleting all records when the number of records is k or less. Data accumulated by multiple companies can be pooled and used safely. Furthermore, it expands the applicability of secure computation AI (e.g., data analysis using AI) and improves ease of implementation (e.g., business asset integration), enabling deployment to a wide range of users.
[Modification 1]

　ただし、同じクラスのレコードがｋ個未満の場合は、クラスの種類にｋを乗じた数よりも残るレコードの数が少なくなる。この場合、同じクラスのレコードがｋ個未満のデータがどの程度の数あったのかは分かる。そこで、変形例１では同じクラスのレコードがｋ個未満のデータが存在したか否か分からない処理を付加する。図２に示すように、秘密計算サーバ１００－ｘは、レコード数取得部１７０、不足数算出部１７５、ダミー付加部１８５も備える。図１０に、変形例１のレコード抽出方法の処理フローを示す。 However, if there are fewer than k records of the same class, the number of remaining records will be less than the number obtained by multiplying the class type by k. In this case, it is possible to determine how many records of the same class there are that have fewer than k records. Therefore, in variant 1, a process is added that determines whether or not there is data with fewer than k records of the same class. As shown in Figure 2, the secure computation server 100-x also includes a record number acquisition unit 170, a shortage number calculation unit 175, and a dummy addition unit 185. Figure 10 shows the processing flow of the record extraction method of variant 1.

　ステップＳ１１０～Ｓ１４０は、図３と同じである。秘密計算サーバ１００－１,...,Ｘのレコード数取得部１７０は、協調してクラスｃ_ｎごとに、同じクラスのレコード数ｍ_Ｌのシェア［ｍ_Ｌ］を求める（Ｓ１７０）。秘密計算サーバ１００－１,...,Ｘの不足数算出部１７５は、協調して同じクラスのレコード数ｍ_Ｌがｋ未満のクラスに対して不足数ｓｉ＝ｋ－ｍ_Ｌのシェア［ｓｉ］を求め、求めたすべての不足数ｓｉの合計Ｓのシェア［Ｓ］を求める（Ｓ１７５）。秘密計算サーバ１００－１,...,Ｘのダミー付加部１８５は、協調してＳ個のダミーデータのシェアをレコードのシェアに付加する（Ｓ１８５）。ステップＳ１５０とステップＳ２１２は、実施例1と同じである。 Steps S110 to S140 are the same as in FIG. 3. The record number acquisition units 170 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [m _L ] of the number of records m _L of the same class for each class c _n (S170). The shortage calculation units 175 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [si] of the shortage si = k - m _L for classes where the number of records m _L of the same class is less than k, and calculate the shares [S] of the total S of all the calculated shortages si (S175). The dummy addition units 185 of the secure computation servers 100-1, ..., X cooperatively add S shares of dummy data to the shares of the records (S185). Steps S150 and S212 are the same as in the first embodiment.

　その後、秘密計算サーバ１００－１,...,Ｘの削除部１６０は、協調して一致を意味する比較結果が対応付けられているレコードのシェアと、ダミーデータのシェアを削除する（Ｓ１６１）。なお、不足数の合計Ｓを積極的に知らせたい場合は、いずれかの秘密計算サーバ１００－ｘの復元部１８０、もしくは秘密計算システム１０に接続された秘密計算クライアント装置２００の復元部２１２が、シェア［Ｓ］を用いて合計Ｓを復元してもよい。 Then, the deletion units 160 of the secure computation servers 100-1, ..., X cooperatively delete the shares of the records associated with the comparison results indicating a match, and the shares of the dummy data (S161). Note that if it is desired to proactively notify the total number of missing shares, S, the restoration unit 180 of any of the secure computation servers 100-x, or the restoration unit 212 of the secure computation client device 200 connected to the secure computation system 10, may restore the total S using the shares [S].

　本変形例の場合も、実施例1と同様に、データの秘匿性は維持した上で属性のクラス比率が均等になるようにレコードを抽出できる。また、等配分法は指定した属性のクラス比率を均等にすることで属性の偏りによるＡＩ学習への影響をなくせる。その実現により、レコード数の少ないクラスのデータを漏れなく抽出しつつ、ＡＩ分析をより高速に処理できる。 In this modified example, as in Example 1, records can be extracted so that the class ratio of attributes is equal while maintaining data confidentiality. Furthermore, the equal allocation method can eliminate the impact of attribute bias on AI learning by equalizing the class ratio of specified attributes. This makes it possible to extract data from classes with a small number of records without omission, while also processing AI analysis more quickly.

［プロセッサ、プログラム、記録媒体］
　本明細書中に記載されている構成要素により実現される機能は、当該記載された機能を実現するようにプログラムされた、汎用プロセッサ、特定用途プロセッサ、集積回路、ＡＳＩＣｓ(Application Specific Integrated Circuits)、ＣＰＵ(a Central Processing Unit)、従来型の回路、および／又はそれらの組合せを含む、circuitry又はprocessing circuitryにおいて実装されてもよい。プロセッサは、トランジスタやその他の回路を含み、 circuitry又はprocessing circuitryとみなされる。プロセッサは、メモリに格納されたプログラムを実行する、programmed processorであってもよい。 [Processor, program, recording medium]
The functions performed by the components described herein may be implemented in circuitry or processing circuitry, including general-purpose processors, application-specific processors, integrated circuits, ASICs (Application Specific Integrated Circuits), a CPU (a Central Processing Unit), conventional circuits, and/or combinations thereof, programmed to perform the described functions. A processor includes transistors and other circuits and is considered to be circuitry or processing circuitry. A processor may also be a programmed processor that executes programs stored in memory.

　本明細書において、circuitry、ユニット、手段は、記載された機能を実現するようにプログラムされたハードウェア、又は実行するハードウェアである。当該ハードウェアは、本明細書に開示されているあらゆるハードウェア、又は、当該記載された機能を実現するようにプログラムされた、又は、実行するものとして知られているあらゆるハードウェアであってもよい。 In this specification, a circuit, unit, or means refers to hardware that is programmed to realize or executes the described functions. The hardware may be any hardware disclosed in this specification or any hardware known to be programmed to realize or execute the described functions.

　当該ハードウェアがcircuitryのタイプであるとみなされるプロセッサである場合、当該circuitry、手段、又はユニットは、ハードウェアと、当該ハードウェア及び又はプロセッサを構成する為に用いられるソフトウェアの組合せである。 If the hardware is a processor, which is considered to be a type of circuitry, the circuitry, means, or unit is the combination of the hardware and the software used to configure the hardware and/or processor.

　上述の各種の処理は、図１１に示すコンピュータ２０００の記録部２０２０に、上記方法の各ステップを実行させるプログラムを読み込ませ、制御部２０１０、入力部２０３０、出力部２０４０、表示部２０５０などに動作させることで実施できる。 The various processes described above can be implemented by loading a program that executes each step of the above method into the recording unit 2020 of the computer 2000 shown in FIG. 11 and operating the control unit 2010, input unit 2030, output unit 2040, display unit 2050, etc.

　この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing this processing can be recorded on a computer-readable recording medium. Examples of computer-readable recording media include magnetic recording devices, optical disks, magneto-optical recording media, and semiconductor memory.

　また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ－ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 In addition, this program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs or CD-ROMs on which the program is recorded. Furthermore, the program may be stored in a storage device on a server computer, and then transferred from the server computer to other computers via a network, thereby distributing the program.

　このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって処理を実行する構成としてもよい。さらには、サーバコンピュータの一部をプログラムと共にユーザに使用させる、いわゆるＳａａＳ（Software as a Service）型のサービスを利用して、端末の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or transferred from a server computer in its own storage device. Then, when executing processing, the computer reads the program stored on its own recording medium and executes processing in accordance with the read program. As another form of execution of this program, the computer may read the program directly from the portable recording medium and execute processing in accordance with that program, or it may execute processing in accordance with the received program each time a program is transferred to this computer from the server computer. Alternatively, the server computer may not transfer the program to this computer, but rather executes processing using a so-called ASP (Application Service Provider) type service, which realizes processing functions simply by issuing execution instructions and obtaining results. Furthermore, the computer may be configured to execute terminal processing using a so-called SaaS (Software as a Service) type service, which allows users to use part of a server computer along with the program. In this embodiment, the program includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to a computer but has properties that dictate computer processing).

　また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, the device is configured by executing a specific program on a computer, but at least part of the processing may also be implemented in hardware.

１０　秘密計算システム　　　　　　　　　　１００　秘密計算サーバ
１１０　乱数付与部　　　　　　　　　　　　１２０　結合部
１３０　ソート部　　　　　　　　　　　　　１４０　比較部
１５０　シャッフル部　　　　　　　　　　　１６０　削除部
１７０　レコード数取得部　　　　　　　　　１７５　不足数算出部
１８０　復元部　　　　　　　　　　　　　　１８５　ダミー付加部
１９０　記録部　　　　　　　　　　　　　　２００　秘密計算クライアント装置
２１０　秘密計算クライアント部　　　　　　２１１　秘密分散部
２１２　復元部　　　　　　　　　　　　　　２２０　ＡｕｔｏＭＬ部
２８０　通信部　　　　　　　　　　　　　　２９０　記録部
３００　ユーザ装置 REFERENCE SIGNS LIST 10 Secure computation system 100 Secure computation server 110 Random number assignment unit 120 Combination unit 130 Sorting unit 140 Comparison unit 150 Shuffling unit 160 Deletion unit 170 Record number acquisition unit 175 Deficiency number calculation unit 180 Restoration unit 185 Dummy addition unit 190 Recording unit 200 Secure computation client device 210 Secure computation client unit 211 Secret sharing unit 212 Restoration unit 220 AutoML unit 280 Communication unit 290 Recording unit 300 User device

Claims

A secure computation system comprising one or more secure computation servers,
N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer.
Each secure computation server:
a recording unit that records the secretly shared shares of the record;
a random number assigning unit that associates a share [ _rn ] obtained by secretly sharing a random number _rn with each record;
a combining unit that calculates a share [c _n ∥r _n ] of a combining c _n ∥r _n between a class c _n in a record and a random number r _n associated with the record;
a sorting unit that sorts the join c _n ∥r _n to obtain a sorted share [c _n ∥r _n ];
a comparison unit that compares the sorted class c _n with the class c _n−k and associates the share of the comparison result with the share of the record;
a shuffling unit that shuffles the comparison results and determines a share of the comparison results after shuffling;
a deletion unit that deletes a share of a record associated with a comparison result indicating a match,
A secure computation system characterized in that any of the secure computation servers or a secure computation client device connected to the secure computation system comprises a restoration unit that restores the post-shuffle comparison result using the share of the post-shuffle comparison result.

2. The secure computing system according to claim 1,
Each secure computation server:
a record number obtaining unit that obtains a share [m _L ] of the number of records m _L of the same class for each class c _n ;
a shortage calculation unit that calculates a share [si] of the shortage si=k-m _L for classes where the number of records m _L of the same class is less than k, and calculates a share [S] of the total S of all the calculated shortages si;
A secure computing system further comprising a dummy adding unit that adds S shares of dummy data to the shares of the record.

3. The secure computing system according to claim 1,
The secure computation system further comprises a secure computation client device for using the secure computation system via a network.

A record extraction method using a secure computation system configured with one or more secure computation servers,
N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer.
Each secure computation server records the secretly shared shares in the recorder.
Each secure computation server cooperates to
A share [ _rn ] obtained by secretly sharing the random number _rn is associated with each record,
A share [c _n ∥r _n ] of the connection c _n ∥r _n between the class c _n in the record and the random number r _n associated with the record is calculated.
The combined c _n ∥r _n is sorted to obtain the post-sort share [c _n ∥r _n ].
The sorted class c _n is compared with the class c _n−k , and the share of the comparison result is associated with the share of the record;
shuffling the comparison results and determining a share of the comparison results after shuffling;
A record extraction method that deletes the share of records for which the comparison result indicates a match.