WO2026009409A1 - Secure computation system and record extraction method - Google Patents
Secure computation system and record extraction methodInfo
- Publication number
- WO2026009409A1 WO2026009409A1 PCT/JP2024/024395 JP2024024395W WO2026009409A1 WO 2026009409 A1 WO2026009409 A1 WO 2026009409A1 JP 2024024395 W JP2024024395 W JP 2024024395W WO 2026009409 A1 WO2026009409 A1 WO 2026009409A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- share
- secure computation
- unit
- record
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09C—CIPHERING OR DECIPHERING APPARATUS FOR CRYPTOGRAPHIC OR OTHER PURPOSES INVOLVING THE NEED FOR SECRECY
- G09C1/00—Apparatus or methods whereby a given sequence of signs, e.g. an intelligible text, is transformed into an unintelligible sequence of signs by transposing the signs or groups of signs or by replacing them by others according to a predetermined system
Definitions
- the present invention relates to a secure computation system consisting of multiple secure computation servers and a record extraction method.
- Secure computing AI is a technology that brings together data accumulated by multiple companies and makes safe use of it. It enables the creation of models and inference (data manipulation in which no one can see anything but the calculation results) using extremely secure and diverse AI algorithms, while keeping the data confidential (secret distribution) and never restoring it. Secure computing AI can realize statistical analysis and machine learning services that can be used while keeping the data confidential, but because analysis and other processing is performed without seeing the data, it is extremely inefficient compared to plain text, and to date, there have been no examples of a system being implemented that operates under realistic usage conditions (acceptable calculation time).
- Non-Patent Documents 1 and 2 is a method for thoroughly extracting data from classes with a small number of records.
- Non-Patent Documents 1 and 2 cannot be simply applied to secure computing AI. Therefore, the present invention aims to provide a method for extracting data from classes with a small number of records without omission, using secure computing.
- the secure computation system of the present invention is composed of one or more secure computation servers.
- Each secure computation server includes a recording unit, a random number assignment unit, a combining unit, a sorting unit, a comparison unit, a shuffling unit, and a deleting unit.
- the random number assignment unit associates a share [r n ] obtained by secretly sharing a random number r n with each record.
- the combining unit obtains a share [c n ⁇ r n ] of the combination c n ⁇ r n between a class c n in the record and the random number r n associated with the record.
- the sorting unit obtains a post-sorted share [ c n ⁇ r n ] by sorting the combination c n ⁇ r n .
- the comparison unit compares the sorted class c n with the class c n-k and associates the share of the comparison result with the share of the record.
- the shuffling unit shuffles the comparison result and obtains the share of the shuffled comparison result.
- the deleting unit deletes the share of the record associated with a comparison result indicating a match.
- Any of the secure computation servers or a secure computation client device connected to the secure computation system includes a restoration unit that restores the post-shuffle comparison result using the share of the post-shuffle comparison result.
- the secure computation system of the present invention extracts records so that the class ratio of the specified attribute is equal, providing a method for using secure computation to thoroughly extract data from classes with a small number of records.
- FIG. 1 is a diagram showing an example of the configuration of a secure computation system made up of multiple secure computation servers connected via a network and a secure computation client device.
- FIG. 2 is a diagram showing an example of the functional configuration of a secure computation server.
- FIG. 4 is a diagram showing the processing flow of the record extraction method according to the first embodiment.
- FIG. 10 is a diagram for explaining an image of a record extraction method.
- FIG. 10 is a diagram showing an example of a record.
- FIG. 10 is a diagram showing an example in which a random number is generated for each record and is linked to data of attribute Y.
- FIG. 10 shows the results of sorting records based on the join c n ⁇ r n .
- FIG. 10 is a diagram showing the results of comparison with k previous data after sorting.
- FIG. 10 is a diagram showing an example in which records with a comparison result of "1" are deleted without shuffling.
- FIG. 10 is a diagram showing the processing flow of a record extraction method according to Modification 1.
- FIG. 2 is a diagram showing an example of the functional configuration of a computer.
- Secret sharing is a technique in which data is converted into multiple shares, allowing the original data to be restored using a certain number of shares or more, but making it impossible to restore the original data using shares less than the certain number.
- (k,n)-secret sharing a type of secret sharing, divides an input plaintext into n shares, distributes the shares to n computing entities, and allows the plaintext to be restored using any k shares, but no information about the plaintext can be obtained using shares less than k.
- n and k are integers greater than or equal to 1, and n ⁇ k.
- a typical example of (k,n)-secret sharing is Shamir secret sharing, described in A. Shamir, "How to share a secret," Communications of the ACM, Volume 22, Issue 11, pp. 612-613, 1979 (Reference 1).
- the secret sharing used in this disclosure may be any method that can utilize secure computation, as described below.
- the secure computation used in the first embodiment of the present disclosure may be one that allows the various operations required for the desired data processing to be performed on shared values obtained by a specific secret sharing method.
- a secure computation technique that performs basic operations such as addition and multiplication on secret shared values is described, for example, in “Koji Senda, Hiroki Hamada, Dai Igarashi, Katsumi Takahashi, "Rethinking Lightly Verifiable Three-Party Secure Function Computation," Computer Security Symposium 2010, 2010 (Reference 2).”
- a secret matching technique that searches for information from the secret shared values of a data string while keeping it secret is described, for example, in “Koji Senda, Masayuki Terada, Takayasu Yamaguchi, Dai Igarashi, Hiroki Hamada, Katsumi Takahashi, “Secure Matching Protocol Considering Statistical Disclosure Control," Information Processing Society of Japan Research Report, 2011-CSEC-52(12), 2011 (Reference 3).”
- N is the number of records to be used for machine learning
- n is an integer between 1 and N
- k is a predetermined positive integer.
- Figure 1 shows an example configuration of a secure computation system and a secure computation client device, which are composed of multiple secure computation servers connected via a network.
- Figure 2 shows an example functional configuration of a secure computation server.
- Figure 3 shows the processing flow of the record extraction method of Example 1.
- Figure 4 is a diagram for explaining an image of the record extraction method.
- the secure computation system 10 is composed of X secure computation servers 100-1, ..., X.
- X may be, for example, 3 or greater, but can be any number greater than 1 as long as the method allows for secure computation, as described below.
- Existing technology can be used for the secure computation system 10 (such as the technology disclosed in WO2012/046692 (family U.S. Patent US8,989,391B2) (Reference 5)).
- WO2012/046692 family U.S. Patent US8,989,391B2
- calculations can be performed while maintaining the data in a secretly shared state (anonymized state), and the results are also recorded anonymously as shares of the results on each secure computation server 100-x. If the results are statistics, the statistics can be known by obtaining and restoring two or more shares, while maintaining the confidentiality of the original data.
- the secure computation client device 200 comprises a secure computation client unit 210, an AutoML unit 220, a communication unit 280, and a recording unit 290.
- the communication unit 280 may be connected to the user device 300.
- encryption technology may be used for communication between the communication unit 280 and the user device 300.
- the recording unit 290 records records used for training the AI model. Each record consists of data with multiple attributes. Attributes include, for example, age, gender, basic medical information (such as the values of blood components), and disease name.
- the secure computation client unit 210 is connected to a secure computation system consisting of multiple secure computation servers 100-1,...,X, and has a secret sharing unit 211 and a restoration unit 212.
- the secure computation client unit 210 transmits shares of records and shares of instruction information to the secure computation system 10, and receives shares of calculation results and shares of evaluation values from the secure computation system 10.
- “transmitting shares to the secure computation system 10” means transmitting shares corresponding to each secure computation server 100-x
- “receiving shares from the secure computation system 10” means receiving shares recorded by each secure computation server 100-x.
- the secret sharing unit 211 converts records, instruction information, etc. into multiple shares and transmits shares corresponding to each secure computation server 100-x.
- the instruction information is information such as which attributes to use for machine learning of the AI model.
- the restoration unit 212 receives shares of calculation results and shares of evaluation values from the secure computation system 10, and acquires plaintext calculation results and evaluation values.
- the secret sharing unit 211 converts the records used for machine learning of the AI model into multiple shares in advance, and transmits the corresponding shares to each secure computation server 100-x.
- Each secure computation server 100-x records the shares of attribute data that make up the record in the recording unit 190.
- the secure computation server 100-x comprises a recording unit 190, a random number assignment unit 110, a combining unit 120, a sorting unit 130, a comparison unit 140, a shuffling unit 150, a deletion unit 160, and a restoration unit 180.
- records are extracted from a collection of records so that records of the same class with a specified attribute are evenly distributed.
- the collection in Figure 4 contains four types of circles, but there is a bias in their numbers. If circles are extracted randomly from the collection, more circles that are present in large numbers in the collection will be extracted, and the number of circles that are rare will decrease. On the other hand, if the four types of circles are extracted in an equal distribution, it is possible to extract both abundant and rare circles evenly.
- FIG. 5 shows an example of records.
- the random number assigning units 110 of the secure computation servers 100-1, ..., X cooperatively generate random numbers r n and associate the shares [r n ] obtained by secretly sharing the random numbers r n with each record (S110).
- the "cooperation” is the same as the existing secure computation technology (the technology disclosed in WO2012/046692 (US Patent US8,989,391B2 of the same family) (Reference 5)), so any existing technology may be used.
- the combining units 120 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [c n ⁇ r n ] of the combination c n ⁇ r n between the class c n in the record and the random number r n associated with the record (S120).
- FIG. 6 shows an example in which a random number is generated for each record and linked to the data of attribute Y.
- the sorting units 130 of the secure computation servers 100-1,...,X cooperate to sort the join cn ⁇ r n and obtain the post-sort share [ cn ⁇ r n ] (S130).
- Fig. 7 shows the results of sorting records based on the join cn ⁇ r n . In other words, when sorting the join cn ⁇ r n , the data cn is also sorted.
- the comparison units 140 of the secure computation servers 100-1,...,X cooperate to compare the sorted class c n with the class c n-k , and associate the share of the comparison result with the share of the record (S140).
- k is an integer specified by the secure computation client device 200 to the secure computation system 10, and indicates how many records of the same class are to be left.
- the shuffling units 150 of the secure computation servers 100-1, ..., X cooperatively shuffle the comparison results and determine the share of the comparison results after shuffling (S150).
- the example in Figure 3 is an example restored by the restoration unit 212 of the secure computation client device 200.
- Figure 9 shows an example in which records with a comparison result of "1" are deleted without shuffling.
- k 2 and there were five types of classes, so in Figure 9, 10 records remain.
- the data for attribute Y is secretly distributed and recorded on secure computation servers 100-1,...,X, and records are extracted using secure computation. This allows records to be extracted so that the attribute class ratio is equal while maintaining the confidentiality of the data. Furthermore, the equal allocation method equalizes the class ratio of specified attributes, eliminating the impact of attribute bias on AI learning. This allows for faster processing of AI analysis while ensuring that data from classes with a small number of records is extracted without omission.
- this invention compares two records and eliminates duplicates, making it possible to implement the equal distribution method with secure computing AI, thereby reducing the number of records to be learned and enabling faster AI analysis.
- Stratified sampling can extract any number of records by ensuring a range of the number k you want to extract and comparing them instead of just before.
- Equal distribution one of the stratified sampling methods, can eliminate the impact of attribute bias by equalizing the class ratio of a specified attribute. Equal distribution only determines whether the current record is of the same class as the k records before it, in other words, whether there are k consecutive records of the same class, so it can extract data from classes with a small number of records without omitting anything. The only thing that needs to be determined is whether they are "same class" or "different class,” and there is no need to determine which class the contents of the different class are, and there is no need to change the processing, allowing for high-speed processing.
- the equipartition method As a record extraction method for secure computation AI and speeding up processing, it is now possible to implement secure computation AI under realistic conditions (within acceptable computation time), which was previously unattainable, and keep data confidential as a whole. Improving efficiency by using simple random sampling to extract records risks not extracting data from classes with a small number of records.
- the equipartition method allows for extraction without simply deleting all records when the number of records is k or less. Data accumulated by multiple companies can be pooled and used safely.
- secure computation AI e.g., data analysis using AI
- ease of implementation e.g., business asset integration
- the secure computation server 100-x also includes a record number acquisition unit 170, a shortage number calculation unit 175, and a dummy addition unit 185.
- Figure 10 shows the processing flow of the record extraction method of variant 1.
- Steps S110 to S140 are the same as in FIG. 3.
- the record number acquisition units 170 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [m L ] of the number of records m L of the same class for each class c n (S170).
- the dummy addition units 185 of the secure computation servers 100-1, ..., X cooperatively add S shares of dummy data to the shares of the records (S185).
- Steps S150 and S212 are the same as in the first embodiment.
- the deletion units 160 of the secure computation servers 100-1, ..., X cooperatively delete the shares of the records associated with the comparison results indicating a match, and the shares of the dummy data (S161). Note that if it is desired to proactively notify the total number of missing shares, S, the restoration unit 180 of any of the secure computation servers 100-x, or the restoration unit 212 of the secure computation client device 200 connected to the secure computation system 10, may restore the total S using the shares [S].
- processors may be implemented in circuitry or processing circuitry, including general-purpose processors, application-specific processors, integrated circuits, ASICs (Application Specific Integrated Circuits), a CPU (a Central Processing Unit), conventional circuits, and/or combinations thereof, programmed to perform the described functions.
- a processor includes transistors and other circuits and is considered to be circuitry or processing circuitry.
- a processor may also be a programmed processor that executes programs stored in memory.
- a circuit, unit, or means refers to hardware that is programmed to realize or executes the described functions.
- the hardware may be any hardware disclosed in this specification or any hardware known to be programmed to realize or execute the described functions.
- the hardware is a processor, which is considered to be a type of circuitry
- the circuitry, means, or unit is the combination of the hardware and the software used to configure the hardware and/or processor.
- the program describing this processing can be recorded on a computer-readable recording medium.
- Examples of computer-readable recording media include magnetic recording devices, optical disks, magneto-optical recording media, and semiconductor memory.
- this program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs or CD-ROMs on which the program is recorded.
- the program may be stored in a storage device on a server computer, and then transferred from the server computer to other computers via a network, thereby distributing the program.
- a computer that executes such a program for example, first stores the program recorded on a portable recording medium or transferred from a server computer in its own storage device. Then, when executing processing, the computer reads the program stored on its own recording medium and executes processing in accordance with the read program. As another form of execution of this program, the computer may read the program directly from the portable recording medium and execute processing in accordance with that program, or it may execute processing in accordance with the received program each time a program is transferred to this computer from the server computer. Alternatively, the server computer may not transfer the program to this computer, but rather executes processing using a so-called ASP (Application Service Provider) type service, which realizes processing functions simply by issuing execution instructions and obtaining results.
- ASP Application Service Provider
- the computer may be configured to execute terminal processing using a so-called SaaS (Software as a Service) type service, which allows users to use part of a server computer along with the program.
- the program includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to a computer but has properties that dictate computer processing).
- the device is configured by executing a specific program on a computer, but at least part of the processing may also be implemented in hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本発明は、複数の秘密計算サーバで構成される秘密計算システムとレコード抽出方法に関する。 The present invention relates to a secure computation system consisting of multiple secure computation servers and a record extraction method.
AI等でのデータ利活用の機運が高まる一方、プライバシー保護の要求も高まっており、医療分野等で機密性の高いデータを実績のあるAIアルゴリズムで分析できる環境が求められている。秘密計算AIは、複数企業の蓄積するデータを持ち寄って、安全に利活用するための技術である。データを秘匿化(秘密分散)したまま一度も復元することなく、極めて安全で多様なAIアルゴリズムによるモデルの作成、推論(計算結果以外は誰にも見えないデータ運用)が可能となる。秘密計算AIは、データを秘匿したまま利用可能な統計分析・機械学習サービスを実現できるが、データを見ずに分析等処理を実施するため、平文と比較して非常に効率が悪く、これまで現実的な利用条件(許容範囲内の計算時間)で動作するシステムが実装された例はない。 While momentum for data utilization using AI and other technologies is growing, so too is the demand for privacy protection, creating a need for an environment where highly confidential data can be analyzed using proven AI algorithms in fields such as healthcare. Secure computing AI is a technology that brings together data accumulated by multiple companies and makes safe use of it. It enables the creation of models and inference (data manipulation in which no one can see anything but the calculation results) using extremely secure and diverse AI algorithms, while keeping the data confidential (secret distribution) and never restoring it. Secure computing AI can realize statistical analysis and machine learning services that can be used while keeping the data confidential, but because analysis and other processing is performed without seeing the data, it is extremely inefficient compared to plain text, and to date, there have been no examples of a system being implemented that operates under realistic usage conditions (acceptable calculation time).
秘密計算AIを秘密計算そのままの形態で実装するのは難しい。つまりモデルの種別、パラメータの深さ、構造情報他、完全な秘密計算でやるためには、すべての学習を抽象化することで長時間に渡る計算が必要となり、ユーザビリティを損ねるため非現実的である。このため実用的な計算速度で秘密計算できる技術が求められている。許容可能な時間内に実行するため、学習に用いるレコードを抽出・選別して効率的な探索を行う必要がある。例えば、レコード数の少ないクラスのデータを漏れなく抽出する方法として層化抽出法(非特許文献1,2)がある。 It is difficult to implement secure computation AI in the form of secure computation itself. In other words, to perform fully secure computation, taking into account model type, parameter depth, structural information, etc., abstracting all learning would require lengthy calculations, which would impair usability and make it unrealistic. For this reason, there is a demand for technology that can perform secure computation at a practical calculation speed. To execute within an acceptable timeframe, it is necessary to extract and select records to be used for learning and perform an efficient search. For example, stratified sampling (Non-Patent Documents 1 and 2) is a method for thoroughly extracting data from classes with a small number of records.
しかしながら、非特許文献1,2に示されたようなレコード数の少ないクラスのデータを漏れなく抽出する方法を、秘密計算AIに単純に適用することはできない。そこで、本発明は、秘密計算を用いてレコード数の少ないクラスのデータを漏れなく抽出する方法を提供することを目的とする。 However, the methods for extracting data from classes with a small number of records without omission, as shown in Non-Patent Documents 1 and 2, cannot be simply applied to secure computing AI. Therefore, the present invention aims to provide a method for extracting data from classes with a small number of records without omission, using secure computing.
本発明の秘密計算システムは、1台以上の秘密計算サーバで構成される。各秘密計算サーバは、記録部、乱数付与部、結合部、ソート部、比較部、シャッフル部、削除部を備える。乱数付与部は、乱数rnを秘密分散したシェア[rn]をレコードごとに対応付ける。結合部は、レコード内のクラスcnとレコードと対応付けられた乱数rnとの結合cn∥rnのシェア[cn∥rn]を求める。ソート部は、結合cn∥rnをソートしたソート後シェア[cn∥rn]を求める。比較部は、ソート後のクラスcnとクラスcn-kを比較し、比較結果のシェアをレコードのシェアに対応付ける。シャッフル部は、比較結果をシャッフルし、シャッフル後の比較結果のシェアを求める。削除部は、一致を意味する比較結果が対応付けられているレコードのシェアを削除する。秘密計算サーバのいずれか、もしくは当該秘密計算システムに接続された秘密計算クライアント装置が、シャッフル後の比較結果のシェアを用いてシャッフル後の比較結果を復元する復元部を備える。 The secure computation system of the present invention is composed of one or more secure computation servers. Each secure computation server includes a recording unit, a random number assignment unit, a combining unit, a sorting unit, a comparison unit, a shuffling unit, and a deleting unit. The random number assignment unit associates a share [r n ] obtained by secretly sharing a random number r n with each record. The combining unit obtains a share [c n ∥r n ] of the combination c n ∥r n between a class c n in the record and the random number r n associated with the record. The sorting unit obtains a post-sorted share [ c n ∥r n ] by sorting the combination c n ∥r n . The comparison unit compares the sorted class c n with the class c n-k and associates the share of the comparison result with the share of the record. The shuffling unit shuffles the comparison result and obtains the share of the shuffled comparison result. The deleting unit deletes the share of the record associated with a comparison result indicating a match. Any of the secure computation servers or a secure computation client device connected to the secure computation system includes a restoration unit that restores the post-shuffle comparison result using the share of the post-shuffle comparison result.
本発明の秘密計算システムによれば、指定した属性のクラス比率が均等になるようにレコードを抽出するので、秘密計算を用いてレコード数の少ないクラスのデータを漏れなく抽出する方法を提供できる。 The secure computation system of the present invention extracts records so that the class ratio of the specified attribute is equal, providing a method for using secure computation to thoroughly extract data from classes with a small number of records.
実施形態の説明に先立ち、本開示で利用する基本的な技術概念を説明する。 Before explaining the embodiments, we will explain the basic technical concepts used in this disclosure.
[秘密分散技術]
秘密分散とは、データを複数の分散値に変換し、一定個数以上の分散値を用いれば元のデータを復元でき、一定個数未満の分散値からは元のデータを一切復元できなくする技術である。秘密分散の一種である(k,n)-秘密分散は、入力された平文をn個に分割した分散値をn個の計算主体に分散しておき、任意のk個の分散値が揃えば平文を復元でき、k個未満の分散値からは平文に関する一切の情報を得られないような秘密分散である。このとき、n,kは1以上の整数であり、n≧kである。(k,n)-秘密分散の代表的な例は、「A.Shamir, "How to share a secret", Communications of the ACM, Volume 22 Issue 11, pp. 612-613, 1979.(参考文献1)」に記載されている、Shamir秘密分散である。本開示で利用する秘密分散は、後述の秘密計算が利用可能な方法であればどのようなものであってもよい。
[Secret sharing technology]
Secret sharing is a technique in which data is converted into multiple shares, allowing the original data to be restored using a certain number of shares or more, but making it impossible to restore the original data using shares less than the certain number. (k,n)-secret sharing, a type of secret sharing, divides an input plaintext into n shares, distributes the shares to n computing entities, and allows the plaintext to be restored using any k shares, but no information about the plaintext can be obtained using shares less than k. Here, n and k are integers greater than or equal to 1, and n≧k. A typical example of (k,n)-secret sharing is Shamir secret sharing, described in A. Shamir, "How to share a secret," Communications of the ACM, Volume 22, Issue 11, pp. 612-613, 1979 (Reference 1). The secret sharing used in this disclosure may be any method that can utilize secure computation, as described below.
[秘密計算技術]
秘密計算は、複数の計算主体に計算対象のデータを秘密分散して保存しておき、元のデータを復元することなく他の計算主体と協力して元のデータの関数値の分散値を計算する技術である。秘密計算では要素技術として秘密分散を利用する。
[Secret computing technology]
Secure computation is a technology in which data to be computed is secretly shared and stored among multiple computing entities, and the computing entities cooperate to calculate the shared values of functions of the original data without recovering the original data.Secure computation uses secret sharing as an elemental technology.
本開示の第1実施形態で利用する秘密計算は、所望のデータ処理に必要な各種演算が特定の秘密分散方法による分散値に対して可能なものを適宜利用すればよい。秘密分散値に対して加算や乗算等の基本演算を行う秘密計算技術は、例えば、「千田浩司、濱田浩気、五十嵐大、高橋克己、"軽量検証可能3パーティ秘匿関数計算の再考"、コンピュータセキュリティシンポジウム2010、2010年(参考文献2)」に記載されている。データ列の秘密分散値から情報を秘匿したまま検索を行う秘密マッチング技術は、例えば、「千田浩司、寺田雅之、山口高康、五十嵐大、濱田浩気、高橋克巳、"統計的開示制御を考慮したセキュアマッチングプロトコル"、情報処理学会研究報告、2011-CSEC-52(12)、2011年(参考文献3)」に記載されている。データ列の秘密分散値を秘匿したまま整列する秘密ソート技術は、例えば、「濱田浩気、五十嵐大、千田浩司、高橋克巳、"秘匿関数計算上の線形時間ソート"、コンピュータセキュリティシンポジウム2011、2011年(参考文献4)」に記載されている。 The secure computation used in the first embodiment of the present disclosure may be one that allows the various operations required for the desired data processing to be performed on shared values obtained by a specific secret sharing method. A secure computation technique that performs basic operations such as addition and multiplication on secret shared values is described, for example, in "Koji Senda, Hiroki Hamada, Dai Igarashi, Katsumi Takahashi, "Rethinking Lightly Verifiable Three-Party Secure Function Computation," Computer Security Symposium 2010, 2010 (Reference 2)." A secret matching technique that searches for information from the secret shared values of a data string while keeping it secret is described, for example, in "Koji Senda, Masayuki Terada, Takayasu Yamaguchi, Dai Igarashi, Hiroki Hamada, Katsumi Takahashi, "Secure Matching Protocol Considering Statistical Disclosure Control," Information Processing Society of Japan Research Report, 2011-CSEC-52(12), 2011 (Reference 3)." A secret sorting technique that sorts data strings while keeping their secret sharing values secret is described, for example, in "Hamada Hiroki, Igarashi Dai, Senda Koji, Takahashi Katsumi, "Linear Time Sorting on Secure Function Computation," Computer Security Symposium 2011, 2011 (Reference 4)."
以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 The following describes in detail an embodiment of the present invention. Components with the same functions are assigned the same numbers, and duplicate explanations will be omitted.
以下の説明では、Nは機械学習に用いるためのレコードの数、nは1以上N以下の整数、kはあらかじめ定められた正の整数とする。図1に、ネットワークを介して接続された複数の秘密計算サーバで構成される秘密計算システムと秘密計算クライアント装置の構成例を示す。図2に、秘密計算サーバの機能構成例を示す。図3に、実施例1のレコード抽出方法の処理フローを示す。図4はレコードの抽出方法のイメージを説明するための図である。 In the following explanation, N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer. Figure 1 shows an example configuration of a secure computation system and a secure computation client device, which are composed of multiple secure computation servers connected via a network. Figure 2 shows an example functional configuration of a secure computation server. Figure 3 shows the processing flow of the record extraction method of Example 1. Figure 4 is a diagram for explaining an image of the record extraction method.
秘密計算システム10はX台の秘密計算サーバ100-1,...,Xで構成される。Xは例えば3以上とすればよいが、後述の秘密計算が利用可能な方法であれば1以上のいずれの数でもよい。秘密計算システム10には、既存の技術を用いればよい(WO2012/046692(ファミリーの米国特許US8,989,391B2)に示された技術など、既存技術がある(参考文献5))。例えば、X=3の場合、1つのデータは3つのシェアに変換され、3台の秘密計算サーバ100-1,2,3は1つずつシェアを記録する。データを復元するためには、2つのシェアが必要である。秘密計算システム10では、データを秘密分散した状態(秘匿化した状態)を維持したまま、計算を行うことができ、結果も結果のシェアとして秘匿化した状態で各秘密計算サーバ100-xに記録される。結果が統計量であれば、2つ以上のシェアを取得し、復元することで統計量を知ることはできるが、元のデータの秘匿性は維持できる。 The secure computation system 10 is composed of X secure computation servers 100-1, ..., X. X may be, for example, 3 or greater, but can be any number greater than 1 as long as the method allows for secure computation, as described below. Existing technology can be used for the secure computation system 10 (such as the technology disclosed in WO2012/046692 (family U.S. Patent US8,989,391B2) (Reference 5)). For example, when X = 3, one piece of data is converted into three shares, and the three secure computation servers 100-1, 100-2, and 100-3 each record one share. Two shares are required to restore the data. In the secure computation system 10, calculations can be performed while maintaining the data in a secretly shared state (anonymized state), and the results are also recorded anonymously as shares of the results on each secure computation server 100-x. If the results are statistics, the statistics can be known by obtaining and restoring two or more shares, while maintaining the confidentiality of the original data.
秘密計算クライアント装置200は、秘密計算クライアント部210、AutoML部220、通信部280、記録部290を備える。通信部280は、ユーザ装置300と接続してもよい。この場合、通信部280とユーザ装置300との間の通信には暗号技術を用いればよい。記録部290には、AIモデルの学習に使用されるレコードが記録される。それぞれのレコードは、複数の属性のデータで構成される。属性とは、例えば、年齢、性別、基礎的な医療情報(血液中の成分の値など)、病名などである。 The secure computation client device 200 comprises a secure computation client unit 210, an AutoML unit 220, a communication unit 280, and a recording unit 290. The communication unit 280 may be connected to the user device 300. In this case, encryption technology may be used for communication between the communication unit 280 and the user device 300. The recording unit 290 records records used for training the AI model. Each record consists of data with multiple attributes. Attributes include, for example, age, gender, basic medical information (such as the values of blood components), and disease name.
秘密計算クライアント部210は、複数の秘密計算サーバ100-1,...,Xで構成される秘密計算システムと接続され、秘密分散部211と復元部212を有する。秘密計算クライアント部210は、秘密計算システム10へのレコードのシェア、指示情報のシェアの送信と、秘密計算システム10からの計算結果のシェア、評価値のシェアの受信を行う。ここで、「秘密計算システム10へのシェアの送信」とは、各秘密計算サーバ100-xに対応するシェアをそれぞれ送信することを意味し、「秘密計算システム10からのシェアの受信」とは、各秘密計算サーバ100-xからそれぞれが記録しているシェアを受信することを意味する。つまり、秘密分散部211は、レコード、指示情報などを複数のシェアに変換し、各秘密計算サーバ100-xに対応するシェアを送信する。指示情報とは、どの属性を用いてAIモデルの機械学習を行うのかなどの情報である。復元部212は、秘密計算システム10での計算結果のシェア、評価値のシェアを受信し、平文の計算結果、評価値を取得する。 The secure computation client unit 210 is connected to a secure computation system consisting of multiple secure computation servers 100-1,...,X, and has a secret sharing unit 211 and a restoration unit 212. The secure computation client unit 210 transmits shares of records and shares of instruction information to the secure computation system 10, and receives shares of calculation results and shares of evaluation values from the secure computation system 10. Here, "transmitting shares to the secure computation system 10" means transmitting shares corresponding to each secure computation server 100-x, and "receiving shares from the secure computation system 10" means receiving shares recorded by each secure computation server 100-x. In other words, the secret sharing unit 211 converts records, instruction information, etc. into multiple shares and transmits shares corresponding to each secure computation server 100-x. The instruction information is information such as which attributes to use for machine learning of the AI model. The restoration unit 212 receives shares of calculation results and shares of evaluation values from the secure computation system 10, and acquires plaintext calculation results and evaluation values.
秘密分散部211は、あらかじめAIモデルの機械学習に使用するレコードを複数のシェアに変換し、各秘密計算サーバ100-xに対応するシェアを送信する。各秘密計算サーバ100-xは、レコードを構成する属性のデータのシェアを記録部190に記録している。 The secret sharing unit 211 converts the records used for machine learning of the AI model into multiple shares in advance, and transmits the corresponding shares to each secure computation server 100-x. Each secure computation server 100-x records the shares of attribute data that make up the record in the recording unit 190.
図2に示すように、秘密計算サーバ100-xは、記録部190、乱数付与部110、結合部120、ソート部130、比較部140、シャッフル部150、削除部160、復元部180を備える。本発明では、レコードの集団の中から、指定した属性が同じクラスのレコードが均等になるようにレコードを抽出する。図4の集団には、4種類の丸があるが、その数には偏りがある。集団からランダムに丸を抽出すると集団の中に多く存在する丸を多く抽出することになり、数が少ない丸の数は少なくなる。一方、4種類の丸を当配分に抽出すれば、多く存在する丸も少ない丸も均等に抽出できる。 As shown in Figure 2, the secure computation server 100-x comprises a recording unit 190, a random number assignment unit 110, a combining unit 120, a sorting unit 130, a comparison unit 140, a shuffling unit 150, a deletion unit 160, and a restoration unit 180. In the present invention, records are extracted from a collection of records so that records of the same class with a specified attribute are evenly distributed. The collection in Figure 4 contains four types of circles, but there is a bias in their numbers. If circles are extracted randomly from the collection, more circles that are present in large numbers in the collection will be extracted, and the number of circles that are rare will decrease. On the other hand, if the four types of circles are extracted in an equal distribution, it is possible to extract both abundant and rare circles evenly.
例えば、医療診断にAIを利用する場合を考えてみる。健康診断のデータの場合、最も多いのは精密検査不要という結果であり、精密検査を行う必要がある場合も、多い病気と少ない病気がある。有効な機械学習をするためには、レコードを抽出する際に、症例が少ない病気のレコード(データ)を削除しないようにする必要がある。また、健康診断の結果に基づく精密検査の必要性の一次判断にAIを利用する場合、精密検査が必要かもしれないという結果が多めに出力される方が望ましい。そこで、本発明では、秘密計算を用いて指定した属性のクラス比率が均等になるようにレコードを抽出することで、レコード数の少ないクラスのデータを漏れなく抽出する。 For example, consider the use of AI in medical diagnosis. In the case of health checkup data, the most common result is that no further examination is necessary, and even when further examination is necessary, there are diseases that are common and diseases that are rare. In order to perform effective machine learning, it is necessary to avoid deleting records (data) for diseases with few cases when extracting records. Furthermore, when using AI to make an initial judgment on the need for further examination based on the results of a health checkup, it is desirable to output more results indicating that further examination may be necessary. Therefore, in this invention, secret computation is used to extract records so that the class ratios of specified attributes are equal, thereby extracting data from classes with few records without omission.
図5は、レコードの例である。20個のレコードがあり、各レコードは属性Yのデータを有している。属性Yのクラスには、”0”,”1”,”2”,”3”,”4”の5種類がある。秘密計算サーバ100-1,...,Xの乱数付与部110は、協調して乱数rnを生成し、乱数rnを秘密分散したシェア[rn]をレコードごとに対応付ける(S110)。「協調」に関しては、既存の秘密計算技術(WO2012/046692(ファミリーの米国特許US8,989,391B2)に示された技術(参考文献5))と同じであり、既存の技術を用いればよい。秘密計算サーバ100-1,...,Xの結合部120は、協調してレコード内のクラスcnとレコードと対応付けられた乱数rnとの結合cn∥rnのシェア[cn∥rn]を求める(S120)。図6は、レコードごとに乱数を生成し属性Yのデータに乱数を結合した例である。 FIG. 5 shows an example of records. There are 20 records, and each record has data of attribute Y. There are five classes of attribute Y: "0", "1", "2", "3", and "4". The random number assigning units 110 of the secure computation servers 100-1, ..., X cooperatively generate random numbers r n and associate the shares [r n ] obtained by secretly sharing the random numbers r n with each record (S110). The "cooperation" is the same as the existing secure computation technology (the technology disclosed in WO2012/046692 (US Patent US8,989,391B2 of the same family) (Reference 5)), so any existing technology may be used. The combining units 120 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [c n ∥r n ] of the combination c n ∥r n between the class c n in the record and the random number r n associated with the record (S120). FIG. 6 shows an example in which a random number is generated for each record and linked to the data of attribute Y.
秘密計算サーバ100-1,...,Xのソート部130は、協調して結合cn∥rnをソートしたソート後シェア[cn∥rn]を求める(S130)。図7は結合cn∥rnに基づいて、レコードをソートした結果を示している。つまり、結合cn∥rnのソートではデータcnも一緒にソートする。 The sorting units 130 of the secure computation servers 100-1,...,X cooperate to sort the join cn∥r n and obtain the post-sort share [ cn∥r n ] (S130). Fig. 7 shows the results of sorting records based on the join cn∥r n . In other words, when sorting the join cn∥r n , the data cn is also sorted.
秘密計算サーバ100-1,...,Xの比較部140は、協調してソート後のクラスcnとクラスcn-kを比較し、比較結果のシェアをレコードのシェアに対応付ける(S140)。kは、秘密計算クライアント装置200が秘密計算システム10に指定した整数であり、同じクラスのレコードを何個残すかを示している。図8はソート後にk個前のデータと比較した結果を示す図である。ソート後のk個前のデータと異なる場合は比較結果として”0”が出力され、同じ場合に比較結果として”1”が出力される。図8の例ではk=2である。 The comparison units 140 of the secure computation servers 100-1,...,X cooperate to compare the sorted class c n with the class c n-k , and associate the share of the comparison result with the share of the record (S140). k is an integer specified by the secure computation client device 200 to the secure computation system 10, and indicates how many records of the same class are to be left. FIG. 8 is a diagram showing the result of a comparison with the k-th data after sorting. If the sorted data is different from the k-th data before it, "0" is output as the comparison result, and if the data is the same, "1" is output as the comparison result. In the example of FIG. 8, k=2.
秘密計算サーバ100-1,...,Xのシャッフル部150は、協調して比較結果をシャッフルし、シャッフル後の比較結果のシェアを求める(S150)。いずれかの秘密計算サーバ100-xの復元部180、もしくは当該秘密計算システム10に接続された秘密計算クライアント装置200の復元部212が、シャッフル後の比較結果のシェアを用いてシャッフル後の比較結果を復元する(S212)。復元するのは、比較結果だけであり属性Yのデータ自体は復元しない。図3の例は秘密計算クライアント装置200の復元部212が復元した例である。 The shuffling units 150 of the secure computation servers 100-1, ..., X cooperatively shuffle the comparison results and determine the share of the comparison results after shuffling (S150). The restoration unit 180 of one of the secure computation servers 100-x, or the restoration unit 212 of the secure computation client device 200 connected to the secure computation system 10, restores the comparison results after shuffling using the share of the comparison results after shuffling (S212). Only the comparison results are restored; the data of attribute Y itself is not restored. The example in Figure 3 is an example restored by the restoration unit 212 of the secure computation client device 200.
その後、秘密計算サーバ100-1,...,Xの削除部160は、協調して一致を意味する比較結果が対応付けられているレコードのシェアを削除する(S160)。図9は、シャッフルは行っていない状態で比較結果が”1”のレコードを削除した例を示している。図8の例ではk=2であり、クラスの種類が5個だったので、図9では10個のレコードが残っている。 Then, the deletion units 160 of the secure computation servers 100-1, ..., X cooperatively delete the shares of records associated with comparison results indicating a match (S160). Figure 9 shows an example in which records with a comparison result of "1" are deleted without shuffling. In the example of Figure 8, k = 2 and there were five types of classes, so in Figure 9, 10 records remain.
属性Yのデータは秘密分散されて秘密計算サーバ100-1,...,Xが記録しており、秘密計算でレコードを抽出する。よって、データの秘匿性は維持した上で属性のクラス比率が均等になるようにレコードを抽出できる。また、等配分法は指定した属性のクラス比率を均等にすることで属性の偏りによるAI学習への影響をなくせる。その実現により、レコード数の少ないクラスのデータを漏れなく抽出しつつ、AI分析をより高速に処理できる。 The data for attribute Y is secretly distributed and recorded on secure computation servers 100-1,...,X, and records are extracted using secure computation. This allows records to be extracted so that the attribute class ratio is equal while maintaining the confidentiality of the data. Furthermore, the equal allocation method equalizes the class ratio of specified attributes, eliminating the impact of attribute bias on AI learning. This allows for faster processing of AI analysis while ensuring that data from classes with a small number of records is extracted without omission.
上述のように、本発明は、2つのレコードを比較して重複排除するので、秘密計算AIで等配分法を実現できるため、学習するレコード数を削減し、AI分析をより高速に処理できる。層化抽出法は、直前でなく抽出したい個数kの幅を確保して比較することで任意の個数を抽出できる。層化抽出法の一つである等配分法は、指定した属性のクラス比率を均等にすることで属性の偏りによる影響をなくせる。等配分法では現在のレコードがk個前のレコードと同じクラスか、つまり同じクラスがk個連続しているかのみを判定するため、レコード数の少ないクラスのデータを漏れなく抽出できる。判定するのは「同じクラス」か「違うクラス」かだけで、違うクラスの中身が何番目のクラスであるかは判定不要で、また、処理を変える必要もないため、高速処理が実現される。 As described above, this invention compares two records and eliminates duplicates, making it possible to implement the equal distribution method with secure computing AI, thereby reducing the number of records to be learned and enabling faster AI analysis. Stratified sampling can extract any number of records by ensuring a range of the number k you want to extract and comparing them instead of just before. Equal distribution, one of the stratified sampling methods, can eliminate the impact of attribute bias by equalizing the class ratio of a specified attribute. Equal distribution only determines whether the current record is of the same class as the k records before it, in other words, whether there are k consecutive records of the same class, so it can extract data from classes with a small number of records without omitting anything. The only thing that needs to be determined is whether they are "same class" or "different class," and there is no need to determine which class the contents of the different class are, and there is no need to change the processing, allowing for high-speed processing.
秘密計算AIのレコード抽出法として等分配法を実現し、処理が高速化されるため、これまで実現できなかった秘密計算AIを現実的な条件(許容範囲内の計算時間)で実装し、全体としてデータを秘匿できる。単純なランダムサンプリングを使ったレコード抽出による効率化だとレコード数の少ないクラスのデータが抽出されない恐れがある。しかし、等分配法は、レコード数がk個以下の場合は単純にすべてのレコードが削除されず抽出できる。複数企業の蓄積するデータを持ち寄って、安全にデータを利活用でき、また、秘密計算AIの適用性拡大(AIによるデータ分析等)、導入容易性向上(事業アセット連携等)を図り、幅広いユーザへの展開が可能となる。
[変形例1]
By implementing the equipartition method as a record extraction method for secure computation AI and speeding up processing, it is now possible to implement secure computation AI under realistic conditions (within acceptable computation time), which was previously unattainable, and keep data confidential as a whole. Improving efficiency by using simple random sampling to extract records risks not extracting data from classes with a small number of records. However, the equipartition method allows for extraction without simply deleting all records when the number of records is k or less. Data accumulated by multiple companies can be pooled and used safely. Furthermore, it expands the applicability of secure computation AI (e.g., data analysis using AI) and improves ease of implementation (e.g., business asset integration), enabling deployment to a wide range of users.
[Modification 1]
ただし、同じクラスのレコードがk個未満の場合は、クラスの種類にkを乗じた数よりも残るレコードの数が少なくなる。この場合、同じクラスのレコードがk個未満のデータがどの程度の数あったのかは分かる。そこで、変形例1では同じクラスのレコードがk個未満のデータが存在したか否か分からない処理を付加する。図2に示すように、秘密計算サーバ100-xは、レコード数取得部170、不足数算出部175、ダミー付加部185も備える。図10に、変形例1のレコード抽出方法の処理フローを示す。 However, if there are fewer than k records of the same class, the number of remaining records will be less than the number obtained by multiplying the class type by k. In this case, it is possible to determine how many records of the same class there are that have fewer than k records. Therefore, in variant 1, a process is added that determines whether or not there is data with fewer than k records of the same class. As shown in Figure 2, the secure computation server 100-x also includes a record number acquisition unit 170, a shortage number calculation unit 175, and a dummy addition unit 185. Figure 10 shows the processing flow of the record extraction method of variant 1.
ステップS110~S140は、図3と同じである。秘密計算サーバ100-1,...,Xのレコード数取得部170は、協調してクラスcnごとに、同じクラスのレコード数mLのシェア[mL]を求める(S170)。秘密計算サーバ100-1,...,Xの不足数算出部175は、協調して同じクラスのレコード数mLがk未満のクラスに対して不足数si=k-mLのシェア[si]を求め、求めたすべての不足数siの合計Sのシェア[S]を求める(S175)。秘密計算サーバ100-1,...,Xのダミー付加部185は、協調してS個のダミーデータのシェアをレコードのシェアに付加する(S185)。ステップS150とステップS212は、実施例1と同じである。 Steps S110 to S140 are the same as in FIG. 3. The record number acquisition units 170 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [m L ] of the number of records m L of the same class for each class c n (S170). The shortage calculation units 175 of the secure computation servers 100-1, ..., X cooperatively calculate the shares [si] of the shortage si = k - m L for classes where the number of records m L of the same class is less than k, and calculate the shares [S] of the total S of all the calculated shortages si (S175). The dummy addition units 185 of the secure computation servers 100-1, ..., X cooperatively add S shares of dummy data to the shares of the records (S185). Steps S150 and S212 are the same as in the first embodiment.
その後、秘密計算サーバ100-1,...,Xの削除部160は、協調して一致を意味する比較結果が対応付けられているレコードのシェアと、ダミーデータのシェアを削除する(S161)。なお、不足数の合計Sを積極的に知らせたい場合は、いずれかの秘密計算サーバ100-xの復元部180、もしくは秘密計算システム10に接続された秘密計算クライアント装置200の復元部212が、シェア[S]を用いて合計Sを復元してもよい。 Then, the deletion units 160 of the secure computation servers 100-1, ..., X cooperatively delete the shares of the records associated with the comparison results indicating a match, and the shares of the dummy data (S161). Note that if it is desired to proactively notify the total number of missing shares, S, the restoration unit 180 of any of the secure computation servers 100-x, or the restoration unit 212 of the secure computation client device 200 connected to the secure computation system 10, may restore the total S using the shares [S].
本変形例の場合も、実施例1と同様に、データの秘匿性は維持した上で属性のクラス比率が均等になるようにレコードを抽出できる。また、等配分法は指定した属性のクラス比率を均等にすることで属性の偏りによるAI学習への影響をなくせる。その実現により、レコード数の少ないクラスのデータを漏れなく抽出しつつ、AI分析をより高速に処理できる。 In this modified example, as in Example 1, records can be extracted so that the class ratio of attributes is equal while maintaining data confidentiality. Furthermore, the equal allocation method can eliminate the impact of attribute bias on AI learning by equalizing the class ratio of specified attributes. This makes it possible to extract data from classes with a small number of records without omission, while also processing AI analysis more quickly.
[プロセッサ、プログラム、記録媒体]
本明細書中に記載されている構成要素により実現される機能は、当該記載された機能を実現するようにプログラムされた、汎用プロセッサ、特定用途プロセッサ、集積回路、ASICs(Application Specific Integrated Circuits)、CPU(a Central Processing Unit)、従来型の回路、および/又はそれらの組合せを含む、circuitry又はprocessing circuitryにおいて実装されてもよい。プロセッサは、トランジスタやその他の回路を含み、 circuitry又はprocessing circuitryとみなされる。プロセッサは、メモリに格納されたプログラムを実行する、programmed processorであってもよい。
[Processor, program, recording medium]
The functions performed by the components described herein may be implemented in circuitry or processing circuitry, including general-purpose processors, application-specific processors, integrated circuits, ASICs (Application Specific Integrated Circuits), a CPU (a Central Processing Unit), conventional circuits, and/or combinations thereof, programmed to perform the described functions. A processor includes transistors and other circuits and is considered to be circuitry or processing circuitry. A processor may also be a programmed processor that executes programs stored in memory.
本明細書において、circuitry、ユニット、手段は、記載された機能を実現するようにプログラムされたハードウェア、又は実行するハードウェアである。当該ハードウェアは、本明細書に開示されているあらゆるハードウェア、又は、当該記載された機能を実現するようにプログラムされた、又は、実行するものとして知られているあらゆるハードウェアであってもよい。 In this specification, a circuit, unit, or means refers to hardware that is programmed to realize or executes the described functions. The hardware may be any hardware disclosed in this specification or any hardware known to be programmed to realize or execute the described functions.
当該ハードウェアがcircuitryのタイプであるとみなされるプロセッサである場合、当該circuitry、手段、又はユニットは、ハードウェアと、当該ハードウェア及び又はプロセッサを構成する為に用いられるソフトウェアの組合せである。 If the hardware is a processor, which is considered to be a type of circuitry, the circuitry, means, or unit is the combination of the hardware and the software used to configure the hardware and/or processor.
上述の各種の処理は、図11に示すコンピュータ2000の記録部2020に、上記方法の各ステップを実行させるプログラムを読み込ませ、制御部2010、入力部2030、出力部2040、表示部2050などに動作させることで実施できる。 The various processes described above can be implemented by loading a program that executes each step of the above method into the recording unit 2020 of the computer 2000 shown in FIG. 11 and operating the control unit 2010, input unit 2030, output unit 2040, display unit 2050, etc.
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing this processing can be recorded on a computer-readable recording medium. Examples of computer-readable recording media include magnetic recording devices, optical disks, magneto-optical recording media, and semiconductor memory.
また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 In addition, this program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs or CD-ROMs on which the program is recorded. Furthermore, the program may be stored in a storage device on a server computer, and then transferred from the server computer to other computers via a network, thereby distributing the program.
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって処理を実行する構成としてもよい。さらには、サーバコンピュータの一部をプログラムと共にユーザに使用させる、いわゆるSaaS(Software as a Service)型のサービスを利用して、端末の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。 A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or transferred from a server computer in its own storage device. Then, when executing processing, the computer reads the program stored on its own recording medium and executes processing in accordance with the read program. As another form of execution of this program, the computer may read the program directly from the portable recording medium and execute processing in accordance with that program, or it may execute processing in accordance with the received program each time a program is transferred to this computer from the server computer. Alternatively, the server computer may not transfer the program to this computer, but rather executes processing using a so-called ASP (Application Service Provider) type service, which realizes processing functions simply by issuing execution instructions and obtaining results. Furthermore, the computer may be configured to execute terminal processing using a so-called SaaS (Software as a Service) type service, which allows users to use part of a server computer along with the program. In this embodiment, the program includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to a computer but has properties that dictate computer processing).
また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, the device is configured by executing a specific program on a computer, but at least part of the processing may also be implemented in hardware.
10 秘密計算システム 100 秘密計算サーバ
110 乱数付与部 120 結合部
130 ソート部 140 比較部
150 シャッフル部 160 削除部
170 レコード数取得部 175 不足数算出部
180 復元部 185 ダミー付加部
190 記録部 200 秘密計算クライアント装置
210 秘密計算クライアント部 211 秘密分散部
212 復元部 220 AutoML部
280 通信部 290 記録部
300 ユーザ装置
REFERENCE SIGNS LIST 10 Secure computation system 100 Secure computation server 110 Random number assignment unit 120 Combination unit 130 Sorting unit 140 Comparison unit 150 Shuffling unit 160 Deletion unit 170 Record number acquisition unit 175 Deficiency number calculation unit 180 Restoration unit 185 Dummy addition unit 190 Recording unit 200 Secure computation client device 210 Secure computation client unit 211 Secret sharing unit 212 Restoration unit 220 AutoML unit 280 Communication unit 290 Recording unit 300 User device
Claims (4)
Nは機械学習に用いるためのレコードの数、nは1以上N以下の整数、kはあらかじめ定められた正の整数であり、
各秘密計算サーバは、
レコードを秘密分散したシェアを記録する記録部と、
乱数rnを秘密分散したシェア[rn]をレコードごとに対応付ける乱数付与部と、
レコード内のクラスcnとレコードと対応付けられた乱数rnとの結合cn∥rnのシェア[cn∥rn]を求める結合部と、
結合cn∥rnをソートしたソート後シェア[cn∥rn]を求めるソート部と、
前記ソート後のクラスcnとクラスcn-kを比較し、比較結果のシェアを前記レコードのシェアに対応付ける比較部と、
前記比較結果をシャッフルし、シャッフル後の比較結果のシェアを求めるシャッフル部と、
一致を意味する比較結果が対応付けられているレコードのシェアを削除する削除部と
を備え、
前記秘密計算サーバのいずれか、もしくは当該秘密計算システムに接続された秘密計算クライアント装置が、シャッフル後の比較結果のシェアを用いてシャッフル後の比較結果を復元する復元部を備える
ことを特徴とする秘密計算システム。 A secure computation system comprising one or more secure computation servers,
N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer.
Each secure computation server:
a recording unit that records the secretly shared shares of the record;
a random number assigning unit that associates a share [ rn ] obtained by secretly sharing a random number rn with each record;
a combining unit that calculates a share [c n ∥r n ] of a combining c n ∥r n between a class c n in a record and a random number r n associated with the record;
a sorting unit that sorts the join c n ∥r n to obtain a sorted share [c n ∥r n ];
a comparison unit that compares the sorted class c n with the class c n−k and associates the share of the comparison result with the share of the record;
a shuffling unit that shuffles the comparison results and determines a share of the comparison results after shuffling;
a deletion unit that deletes a share of a record associated with a comparison result indicating a match,
A secure computation system characterized in that any of the secure computation servers or a secure computation client device connected to the secure computation system comprises a restoration unit that restores the post-shuffle comparison result using the share of the post-shuffle comparison result.
各秘密計算サーバは、
クラスcnごとに、同じクラスのレコード数mLのシェア[mL]を求めるレコード数取得部と、
同じクラスのレコード数mLがk未満のクラスに対して不足数si=k-mLのシェア[si]を求め、求めたすべての不足数siの合計Sのシェア[S]を求める不足数算出部と、
S個のダミーデータのシェアを前記レコードのシェアに付加するダミー付加部と
も備える
ことを特徴とする秘密計算システム。 2. The secure computing system according to claim 1,
Each secure computation server:
a record number obtaining unit that obtains a share [m L ] of the number of records m L of the same class for each class c n ;
a shortage calculation unit that calculates a share [si] of the shortage si=k-m L for classes where the number of records m L of the same class is less than k, and calculates a share [S] of the total S of all the calculated shortages si;
A secure computing system further comprising a dummy adding unit that adds S shares of dummy data to the shares of the record.
さらに、前記秘密計算システムを、ネットワークを介して使用するための秘密計算クライアント装置を有する
ことを特徴とする秘密計算システム。 3. The secure computing system according to claim 1,
The secure computation system further comprises a secure computation client device for using the secure computation system via a network.
Nは機械学習に用いるためのレコードの数、nは1以上N以下の整数、kはあらかじめ定められた正の整数であり、
各秘密計算サーバは、記録部にレコードを秘密分散したシェアを記録しておき、
各秘密計算サーバが協調して、
乱数rnを秘密分散したシェア[rn]をレコードごとに対応付け、
レコード内のクラスcnとレコードと対応付けられた乱数rnとの結合cn∥rnのシェア[cn∥rn]を求め、
結合cn∥rnをソートしたソート後シェア[cn∥rn]を求め、
前記ソート後のクラスcnとクラスcn-kを比較し、比較結果のシェアを前記レコードのシェアに対応付け、
前記比較結果をシャッフルし、シャッフル後の比較結果のシェアを求め、
一致を意味する比較結果が対応付けられているレコードのシェアを削除する
レコード抽出方法。 A record extraction method using a secure computation system configured with one or more secure computation servers,
N is the number of records to be used for machine learning, n is an integer between 1 and N, and k is a predetermined positive integer.
Each secure computation server records the secretly shared shares in the recorder.
Each secure computation server cooperates to
A share [ rn ] obtained by secretly sharing the random number rn is associated with each record,
A share [c n ∥r n ] of the connection c n ∥r n between the class c n in the record and the random number r n associated with the record is calculated.
The combined c n ∥r n is sorted to obtain the post-sort share [c n ∥r n ].
The sorted class c n is compared with the class c n−k , and the share of the comparison result is associated with the share of the record;
shuffling the comparison results and determining a share of the comparison results after shuffling;
A record extraction method that deletes the share of records for which the comparison result indicates a match.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2024/024395 WO2026009409A1 (en) | 2024-07-05 | 2024-07-05 | Secure computation system and record extraction method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2024/024395 WO2026009409A1 (en) | 2024-07-05 | 2024-07-05 | Secure computation system and record extraction method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2026009409A1 true WO2026009409A1 (en) | 2026-01-08 |
Family
ID=98317966
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/024395 Pending WO2026009409A1 (en) | 2024-07-05 | 2024-07-05 | Secure computation system and record extraction method |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2026009409A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012150764A (en) * | 2011-01-21 | 2012-08-09 | Nippon Telegr & Teleph Corp <Ntt> | Secure retrieval system, secret retrieval apparatus, secure retrieval method, and secure retrieval program |
| WO2015107951A1 (en) * | 2014-01-17 | 2015-07-23 | 日本電信電話株式会社 | Secure computation method, secure computation system, sorting device, and program |
| WO2022259333A1 (en) * | 2021-06-07 | 2022-12-15 | 日本電気株式会社 | Learning method |
-
2024
- 2024-07-05 WO PCT/JP2024/024395 patent/WO2026009409A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012150764A (en) * | 2011-01-21 | 2012-08-09 | Nippon Telegr & Teleph Corp <Ntt> | Secure retrieval system, secret retrieval apparatus, secure retrieval method, and secure retrieval program |
| WO2015107951A1 (en) * | 2014-01-17 | 2015-07-23 | 日本電信電話株式会社 | Secure computation method, secure computation system, sorting device, and program |
| WO2022259333A1 (en) * | 2021-06-07 | 2022-12-15 | 日本電気株式会社 | Learning method |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wei et al. | Vertical federated learning: Challenges, methodologies and experiments | |
| Yu et al. | Enabling cloud storage auditing with key-exposure resistance | |
| Li et al. | Privacy-preserving outsourced classification in cloud computing | |
| US11003681B2 (en) | Anonymization system | |
| Sharma et al. | PrivateGraph: Privacy-preserving spectral analysis of encrypted graphs in the cloud | |
| US20130287210A1 (en) | Data processing apparatus and data storage apparatus | |
| US20100058476A1 (en) | Electronic information retention method/system, electronic information split retention method/system, electronic information split restoration processing method/system, and programs for the same | |
| Preethi et al. | Modelling LSUTE: PKE Schemes for Safeguarding Electronic Healthcare Records Over Cloud Communication Environment. | |
| JP4256897B2 (en) | Apparatus, method and program for providing matching service | |
| Backes et al. | Anonymous ram | |
| Falk et al. | GigaDORAM: Breaking the Billion Address Barrier. | |
| Kantarcioglu et al. | An architecture for privacy-preserving mining of client information | |
| Chen et al. | Secure remote cloud file sharing with attribute-based access control and performance optimization | |
| JP5758315B2 (en) | Anonymous data providing system, anonymous data device, and method executed by them | |
| Simić et al. | A review on generating random numbers in decentralised environments | |
| Duan et al. | Practical distributed privacy-preserving data analysis at large scale | |
| CN107210005B (en) | Matrix/key generation device, matrix/key generation system, matrix combination device, matrix/key generation method, and program | |
| Pallas et al. | Three tales of disillusion: Benchmarking property preserving encryption schemes | |
| WO2026009409A1 (en) | Secure computation system and record extraction method | |
| JP2013156719A (en) | Anonymous data providing system, anonymous data device, and method performed thereby | |
| Wang et al. | A secure data sharing scheme with cheating detection based on Chaum-Pedersen protocol for cloud storage | |
| EP4092585A1 (en) | Control method, control program, and information processing device | |
| Eltayesh et al. | Refined game-theoretic approach to improve authenticity of outsourced databases | |
| CN119150359B (en) | Privacy protection ultra-high-dimensional feature screening method in longitudinal federal learning | |
| Kanakamedala et al. | Attribute-based storage supporting secure deduplication of encrypted data in cloud |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24946396 Country of ref document: EP Kind code of ref document: A1 |