JP7322308B2

JP7322308B2 - Circuit information, computing device, computing method and program

Info

Publication number: JP7322308B2
Application number: JP2023014423A
Authority: JP
Inventors: 光介辰村; 隼人後藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2018-09-18
Filing date: 2023-02-02
Publication date: 2023-08-07
Anticipated expiration: 2038-09-18
Also published as: JP2023052843A; JP7551863B2; JP2023156349A

Description

本発明の実施形態は、回路情報、計算装置、計算方法およびプログラムに関する。 The embodiments of the present invention relate to circuit information , computing devices, computing methods, and programs .

イジングモデルを用いて最適化問題を解くための様々なアルゴリズムが提案されている。また、イジングモデルを用いて最適化問題を解くハードウェアも提案されている。 Various algorithms have been proposed to solve optimization problems using Ising models. Hardware that solves the optimization problem using the Ising model has also been proposed.

このような最適化問題を解くハードウェアは、簡易な構成で、高速に問題が解けることが好ましい。また、このような最適化問題を解くハードウェアは、取り扱うことが可能な変数の個数が多い方が好ましい。また、このような最適化問題を解くハードウェアは、取り扱う変数の個数が変わっても、大幅な設計変更をすることなく、自在に対応できることが好ましい。 It is preferable that the hardware for solving such an optimization problem has a simple configuration and can solve the problem at high speed. Also, hardware that solves such optimization problems should preferably be able to handle a large number of variables. Moreover, hardware that solves such an optimization problem should preferably be able to cope with changes in the number of variables to be handled without any major design changes.

特許第５８６５４５６号公報Japanese Patent No. 5865456 特開２０１８－００５５４１号公報JP 2018-005541 A 特開２０１８－０１０４７４号公報JP 2018-010474 A

T. Inagaki et al.,“A coherent Ising machine for 2000-node optimization problems”, Science 354, 603 (2016).T. Inagaki et al.,“A coherent Ising machine for 2000-node optimization problems”, Science 354, 603 (2016). H. Goto,“Bifurcation-based adiabatic quantum computation with a nonlinear oscillator network”, Sci. Rep. 6, 21686 (2016).H. Goto,“Bifurcation-based adiabatic quantum computation with a nonlinear oscillator network”, Sci. Rep. 6, 21686 (2016). Y. Haribara et al., “Performance evaluation of coherent Ising machines against classical neural networks”, Quantum Sci. Technol. 2, 044002 (2017).Y. Haribara et al., “Performance evaluation of coherent Ising machines against classical neural networks”, Quantum Sci. Technol. 2, 044002 (2017).

発明が解決しようとする課題は、簡易な構成で最適化問題を解くことである。 The problem to be solved by the invention is to solve the optimization problem with a simple configuration.

実施形態に係る回路情報は、ハードウェア記述言語により記載された、回路の構成を表す。前記回路情報は、前記回路を、イジングモデルを用いた最適化問題を解く計算装置として機能させる。前記計算装置は、第１変数メモリと、第２変数メモリと、行列乗算部と、時間発展部と、管理部と、出力部と、を備える。前記行列乗算部は、第１時刻におけるＮ個（Ｎは２以上の整数）の第１中間変数と、予め設定されたＮ行×Ｎ列の係数を含む係数行列とを行列乗算することにより、前記第１時刻におけるＮ個の第２中間変数を算出する。前記時間発展部は、前記第１時刻における前記Ｎ個の第２中間変数に基づき、前記第１時刻から１サンプリング期間後の第２時刻におけるＮ個の第１変数および前記第２時刻におけるＮ個の第２変数を算出し、前記第２時刻におけるＮ個の第１変数を前記第１変数メモリに書き込み、前記第２時刻におけるＮ個の第２変数を前記第２変数メモリに書き込む。前記管理部は、開始時刻からサンプリング期間毎で時刻を増加させ、それぞれの時刻に対する処理を前記行列乗算部および前記時間発展部に実行させる。前記出力部は、予め設定された終了時刻におけるＮ個の第１変数を出力する。前記Ｎ個の第１変数は、前記イジングモデルにおけるＮ個のスピンに対応する。前記Ｎ個の第２変数は、前記Ｎ個のスピンに対応する。前記Ｎ個のスピンは、Ｎ個の点に対応する。前記Ｎ個の第１変数のそれぞれは、前記Ｎ個の点のうちの対応するスピンに対応する点の位置を表す。前記Ｎ個の第２変数のそれぞれは、前記Ｎ個の点のうちの対応するスピンに対応する点の運動量を表す。前記Ｎ個の第１中間変数は、前記Ｎ個の第１変数に対応し、前記Ｎ個の第１中間変数のそれぞれは、前記Ｎ個の第１変数のうちの対応する第１変数または前記対応する第１変数に予め設定された係数を乗じた値である。前記Ｎ個の第２中間変数は、前記Ｎ個の第２変数に対応する。 Circuit information according to the embodiment represents the configuration of a circuit described in a hardware description language. The circuit information causes the circuit to function as a computing device that solves an optimization problem using the Ising model. The computing device comprises a first variable memory, a second variable memory, a matrix multiplication unit, a time evolution unit, a management unit, and an output unit. The matrix multiplication unit performs matrix multiplication of N first intermediate variables (N is an integer equal to or greater than 2) at a first time and a coefficient matrix containing preset N rows×N columns of coefficients, Calculate N second intermediate variables at the first time. The time evolution unit generates N first variables at a second time after one sampling period from the first time and N variables at the second time based on the N second intermediate variables at the first time. N first variables at the second time are written into the first variable memory, and N second variables at the second time are written into the second variable memory. The management unit increases the time for each sampling period from the start time, and causes the matrix multiplication unit and the time evolution unit to execute processing for each time. The output unit outputs N first variables at a preset end time. The N first variables correspond to N spins in the Ising model . The N second variables correspond to the N spins. The N spins correspond to N points. Each of the N first variables represents the position of the point corresponding to the corresponding spin among the N points. Each of the N second variables represents the momentum of the point corresponding to the corresponding spin among the N points. The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the It is a value obtained by multiplying the corresponding first variable by a preset coefficient. The N second intermediate variables correspond to the N second variables.

第１実施形態に係る計算装置の構成図。1 is a configuration diagram of a computing device according to a first embodiment; FIG. 演算部の処理の流れを示すフローチャート。4 is a flow chart showing the flow of processing in a computing unit; 第２実施形態に係る演算部の構成図。The block diagram of the calculating part which concerns on 2nd Embodiment. 第２実施形態での変数および係数行列の関係図。FIG. 10 is a relational diagram of variables and coefficient matrices in the second embodiment; 関数演算部の構成の第１例を示す図。FIG. 4 is a diagram showing a first example of the configuration of a function calculation section; 関数演算部の構成の第２例を示す図。The figure which shows the 2nd example of a structure of a function calculating part. 関数演算部の構成の第３例を示す図。The figure which shows the 3rd example of a structure of a function calculating part. 関数演算部の構成の第４例を示す図。The figure which shows the 4th example of a structure of a function calculating part. 第３実施形態での変数および係数行列の関係図。FIG. 10 is a relational diagram of variables and coefficient matrices in the third embodiment; 第３実施形態に係る行列乗算部の構成図。The block diagram of the matrix multiplication part which concerns on 3rd Embodiment. 第３実施形態に係る実装例を示す図。The figure which shows the example of implementation which concerns on 3rd Embodiment. 第４実施形態での変数および係数行列の関係図。FIG. 11 is a relationship diagram of variables and coefficient matrices in the fourth embodiment; 第４実施形態に係る分割行列乗算部の構成図。FIG. 11 is a configuration diagram of a partitioned matrix multiplication unit according to the fourth embodiment; 第５実施形態での変数および係数行列の関係図。FIG. 11 is a relationship diagram of variables and coefficient matrices in the fifth embodiment; 第６実施形態に係るサブ行列乗算部の構成図。The block diagram of the sub-matrix multiplication part which concerns on 6th Embodiment. 分割行列メモリに記憶された分割行列を示す図。FIG. 4 is a diagram showing a partitioning matrix stored in a partitioning matrix memory; 乗累算部に送信されるサブ行列を示す図。FIG. 4 shows sub-matrices sent to the multiply-accumulate unit; 乗累算部の構成図。FIG. 4 is a configuration diagram of a multiplication-accumulation unit; 第５実施形態におけるパラメータおよびタイミングを示す図。The figure which shows the parameter and timing in 5th Embodiment. 第６実施形態に係る時間発展部の構成図。The block diagram of the time evolution part which concerns on 6th Embodiment. 第１中間変数および第２中間変数のタイミングチャート。Timing chart of the first intermediate variable and the second intermediate variable.

以下、図面を参照しながら実施形態に係る計算装置１０について詳細に説明する。実施形態に係る計算装置１０は、イジングモデルを用いた最適化問題を解くことを目的とする。 Hereinafter, the computing device 10 according to the embodiment will be described in detail with reference to the drawings. The computing device 10 according to the embodiment aims to solve an optimization problem using an Ising model.

（前提）
まず、計算装置１０において実行される処理の前提について説明する。 (Premise)
First, the premise of the processing executed in the computing device 10 will be described.

イジングモデルのエネルギーＥ_{Ｉｓｉｎｇ}は、下記の式（１）により表される。

The energy E _Ising of the Ising model is represented by the following equation (1).

式（１）において、Ｎはスピンの数を表す。ｓ_ｉは、ｉ番目のスピンの状態を表す。例えば、ｓ_ｉ＝±１である。ｓ_ｊは、ｊ番目のスピンの状態を表す。例えば、ｓ_ｊ＝±１である。ｉおよびｊは、０以上、（Ｎ－１）以下の整数である。 In equation (1), N represents the number of spins. s _i represents the state of the i-th spin. For example, s _i =±1. s _j represents the state of the j-th spin. For example, s _j =±1. i and j are integers of 0 or more and (N-1) or less.

式（１）において、Ｊ_ｉ，ｊは、Ｎ行×Ｎ列の係数行列Ｊに含まれるｉ行、ｊ列の係数である。Ｊ_ｉ，ｊは、ｉ番目のスピンとｊ番目のスピンとの間の相互作用を表す。係数行列Ｊは、例えば実対称行列である。実対称行列は、対角成分（対角要素）が全てゼロである行列である。ｈ_ｉは、Ｎ個の係数配列に含まれるｉ番目の係数である。ｈ_ｉは、ｉ番目のスピンに単独に加わる作用を表す。イジングモデルのエネルギーＥ_{Ｉｓｉｎｇ}を最小とするスピン状態（基底状態）を探索する問題をイジング問題という。イジング問題を解く機械をイジングマシンと呼んでもよい。イジングマシンは、係数行列Ｊとｈを入力され、基底状態もしくはよりエネルギーの低い近似解を計算し出力する。 In Equation (1), J _i,j is the coefficient in the i-th row and the j-th column included in the coefficient matrix J of N rows×N columns. J _i,j represents the interaction between the i-th spin and the j-th spin. The coefficient matrix J is, for example, a real symmetric matrix. A real symmetric matrix is a matrix whose diagonal elements are all zeros. h _i is the i-th coefficient contained in the N coefficient array. h _i represents an action applied solely to the i-th spin. The problem of searching for the spin state (ground state) that minimizes the energy E _Ising of the Ising model is called the Ising problem. A machine that solves the Ising problem may be called an Ising machine. The Ising machine receives coefficient matrices J and h, and calculates and outputs a ground state or an approximate solution with lower energy.

量子分岐マシンの古典モデル（以下、古典分岐マシンと呼ぶ）が提案されている。古典分岐マシンは、式（２）、式（３）および式（４）に示す連立常微分方程式で表された運動方程式を用いて、式（１）の最適解を算出する。 A classical model of a quantum bifurcation machine (hereinafter referred to as a classical bifurcation machine) has been proposed. The classical bifurcation machine calculates the optimum solution of equation (1) using the equations of motion represented by the simultaneous ordinary differential equations shown in equations (2), (3) and (4).

式（２）、式（３）および式（４）において、Ｎは、質点の数を表し、２以上の整数である。Ｎは、スピンの数に対応する。ｘ_ｉは、ｉ番目の質点の位置を表す実数である。ｙ_ｉは、ｉ番目の質点の運動量を表す実数である。ｉおよびｊは、０から（Ｎ－１）までの整数である。 In formulas (2), (3) and (4), N represents the number of mass points and is an integer of 2 or more. N corresponds to the number of spins. x _i is a real number representing the position of the i-th mass point. _yi is a real number representing the momentum of the i-th mass point. i and j are integers from 0 to (N-1).

式（２）、式（３）および式（４）において、Ｊ_ｉ，ｊは、予め定められたＮ行×Ｎ列の係数を含む係数行列Ｊに含まれるｉ行、ｊ列の係数である。係数行列Ｊは、例えば実対称行列である。ｈ_ｉは、予め定められたＮ個の係数配列に含まれるｉ番目の係数である。式（２）、式（３）および式（４）において、ｈ_ｉの項は、無くてもよい。 In equations (2), (3), and (4), J _i,j is the coefficient in the i-th row and the j-th column included in the predetermined coefficient matrix J including the coefficients in N rows and N columns. . The coefficient matrix J is, for example, a real symmetric matrix. h _i is the i-th coefficient included in a predetermined N coefficient array. In the formulas (2), (3) and (4), the h _i term may be omitted.

式（２）、式（３）および式（４）において、Ｄは、例えば離調に対応する定数である。ｃは、定数である。Ｋは、例えばカー係数に対応する定数である。例えば、Ｄ、ｃおよびＫは、予め定められている。 In equations (2), (3) and (4), D is a constant corresponding to detuning, for example. c is a constant. K is a constant corresponding to, for example, the Kerr coefficient. For example, D, c and K are predetermined.

式（２）、式（３）および式（４）において、ｔは、時刻を表す。ｐ（ｔ）は、例えば、ｔを変数とする関数であり、ポンプレートである。ａ（ｔ）は、ｔを変数とする関数である。ａ（ｔ）は、例えば、下記の式（５）により表される。

In equations (2), (3) and (4), t represents time. p(t) is, for example, a function with t as a variable and is a pump rate. a(t) is a function with t as a variable. a(t) is represented, for example, by the following formula (5).

古典分岐マシンは、ｔをゼロから十分大きな値へ微小時間ずつ増加させて、ｔ毎に、式（２）および式（３）を用いてｘ_ｉおよびｙ_ｉを更新する。そして、古典分岐マシンは、ｔを十分に大きくした場合のｘ_ｉの最終値の符号（±１）を、ｓ_ｉの最適解として出力する。このように、古典分岐マシンは、イジングモデルを、式（２）、式（３）および式（４）のＨをハミルトニアンとしたハミルトン力学系であると見なしている。 The classical branching machine increments t from zero to a sufficiently large value by minute increments and updates x _i and y _i using equations (2) and (3) every t. The classical branching machine then outputs the sign (±1) of the final value of x _i for sufficiently large t as the optimal solution for s _i . Thus, the classical bifurcation machine considers the Ising model to be a Hamiltonian dynamical system with H in Equations (2), (3), and (4) as the Hamiltonian.

また、イジングモデルの最適解を算出する方法として、シミュレーティッドアニーリングが知られている。この方法では、逐次更新アルゴリズムが採用される。逐次更新アルゴリズムは、複数のスピンを１つずつ選択して順次に更新する。このような逐次更新アルゴリズムは、並列計算に適しておらず高速化することは困難である。 Also, simulated annealing is known as a method for calculating the optimum solution of the Ising model. This method employs an iterative update algorithm. The sequential update algorithm selects multiple spins one by one and updates them sequentially. Such a sequential update algorithm is not suitable for parallel computation and is difficult to speed up.

これに対して、古典分岐マシンの運動方程式をデジタル計算機で離散解法によって解くアルゴリズムが考えられる。このアルゴリズムは、シミュレーティッドアニーリングとは異なり、複数の変数を同時に更新することができる。 On the other hand, an algorithm that solves the equation of motion of the classical bifurcation machine by the discrete solution method using a digital computer is conceivable. This algorithm can update multiple variables simultaneously, unlike simulated annealing.

しかし、古典分岐マシンは、最も計算量が大きい係数行列Ｊを用いた行列乗算を、ｘ_ｉおよびｙ_ｉの両者の算出のために、実行しなければならなかった。また、古典分岐マシンは、式（２）、式（３）および式（４）により表される運動方程式を、計算コストの大きい離散解法（４次のルンゲ・クッタ法等）を実行することにより、解かなければならなかった。このため、古典分岐マシンは、計算量が膨大となってしまっていた。 However, the classical bifurcation machine had to perform matrix multiplication with the most computationally intensive coefficient matrix J to compute both x _i and y _i . In addition, the classical bifurcation machine solves the equations of motion expressed by equations (2), (3) and (4) by a discrete solution method (fourth-order Runge-Kutta method, etc.) with high computational cost. , had to solve. For this reason, the classical branching machine has an enormous amount of computation.

これに対して、実施形態に係る計算装置１０は、下記の式（６）、式（７）および式（８）に示す新規な連立常微分方程式で表された運動方程式を用いて、式（１）の最適解を算出する。 On the other hand, the computing device 10 according to the embodiment uses the equations of motion represented by the new simultaneous ordinary differential equations shown in the following equations (6), (7), and (8) to calculate the equation ( Calculate the optimal solution for 1).

式（６）、式（７）および式（８）において、Ｎ、ｘ_ｉ、ｙ_ｉ、ｉ、ｊ、Ｊ_ｉ，ｊ、ｈ_ｉ、Ｄ、ｃ、Ｋ、ｔ、ｐ（ｔ）およびａ（ｔ）は、式（２）～式（４）と同様である。 In equations (6), (7) and (8), N, x _{i ,} y _i , i, j, J _{i, j ,} _hi , D, c, K, t, p(t) and a (t) is the same as in formulas (2) to (4).

実施形態に係る計算装置１０は、ｔをゼロから十分大きな値へと微小時間ずつ増加させて、ｔ毎に、式（６）および式（７）を用いてｘ_ｉおよびｙ_ｉを更新する。そして、実施形態に係る計算装置１０は、ｔを十分に大きくした場合のｘ_ｉの最終値の符号（±１）を、ｓ_ｉの最適解として出力する。このように、実施形態に係る計算装置１０は、イジング問題を、式（８）のＨをハミルトニアンとしたハミルトン力学系の時間発展をシミュレートすることで解く。 The computing device 10 according to the embodiment increments t from zero to a sufficiently large value in small time increments, and updates x _i and y _i using equations (6) and (7) every time t. Then, the computing device 10 according to the embodiment outputs the sign (±1) of the final value of x _i when t is sufficiently large as the optimum solution of s _i . As described above, the computing device 10 according to the embodiment solves the Ising problem by simulating the time evolution of the Hamiltonian dynamical system, where H in Equation (8) is the Hamiltonian.

式（６）、式（７）および式（８）において、最も計算量が大きい係数行列Ｊに対する行列乗算は、式（７）には含まれるが、式（６）には含まれない。従って、実施形態に係る計算装置１０は、最も計算量が大きい係数行列Ｊに対する行列乗算を、ｙ_ｉの更新のためにのみ実行すればよく、ｘ_ｉの更新のためには実行しなくてよい。また、ｘ_ｉの時間微分値（ｄｘ_ｉ／ｄｔ）を算出するための式（６）は、ｐ（ｔ）が消去されている。従って、実施形態に係る計算装置１０は、少ない計算量でイジングモデルの最適解を算出することができる。 In equations (6), (7), and (8), matrix multiplication for the coefficient matrix J, which requires the largest amount of calculation, is included in equation (7) but not included in equation (6). Therefore, the computing device 10 according to the embodiment only needs to perform matrix multiplication on the coefficient matrix J, which requires the largest amount of calculation, for updating _yi , and does not need to perform it for updating _xi . . Also, p(t) is eliminated from the equation (6) for calculating the time differential value (dx _i /dt) of x _i . Therefore, the computing device 10 according to the embodiment can compute the optimum solution of the Ising model with a small amount of computation.

また、ｘ_ｉの時間微分値（ｄｘ_ｉ／ｄｔ）を算出するための式（６）は、ｙ_ｉを含むが、ｘ_ｉを含まない。また、ｙ_ｉの時間微分値（ｄｙ_ｉ／ｄｔ）を算出するための式（７）は、ｘ_ｉを含むが、ｙ_ｉを含まない。 Also, Equation (6) for calculating the time differential value (dx _i /dt) of x _i includes y _i but does not include x _i . Also, Equation (7) for calculating the time differential value (dy _i /dt) of y _i includes x _i but does not include y _i .

つまり、式（６）および式（７）を用いる場合、ハミルトニアンにおいて、ｘ_ｉとｙ_ｉとは、互いに分離されている。従って、実施形態に係る計算装置１０は、計算量が小さく安定な離散解法を適用して、ｘ_ｉおよびｙ_ｉを更新することが可能となる。例えば、実施形態に係る計算装置１０は、シンプレクティック・オイラー法等を適用して、ｘ_ｉおよびｙ_ｉを更新する。従って、実施形態に係る計算装置１０は、簡易な演算および簡易な構成で、イジングモデルを用いた最適化問題の最適解を算出することができる。 That is, when using equations (6) and (7), x _i and y _i are separated from each other in the Hamiltonian. Therefore, the computing device 10 according to the embodiment can update x _i and y _i by applying a stable discrete solution method with a small amount of computation. For example, the computing device 10 according to the embodiment applies the symplectic Euler method or the like to update x _i and y _i . Therefore, the computing device 10 according to the embodiment can calculate the optimum solution of the optimization problem using the Ising model with a simple calculation and a simple configuration.

また、実施形態に係る計算装置１０は、下記の式（９）および式（１０）に示す新規な連立常微分方程式で表された運動方程式を用いて、式（１）の最適解を算出してもよい。 Further, the calculation device 10 according to the embodiment calculates the optimum solution of Equation (1) using the equations of motion represented by the new simultaneous ordinary differential equations shown in Equations (9) and (10) below. may

式（９）および式（１０）において、Ｎ、ｘ_ｉ、ｙ_ｉ、ｉ、ｊ、Ｊ_ｉ，ｊ、ｈ_ｉ、Ｄ、ｃ、Ｋ、ｔ、ｐ（ｔ）およびａ（ｔ）は、式（２）～式（４）と同様である。式（９）および式（１０）において、ｎは、２以上の偶数である。 In equations (9) and (10), N, x _{i ,} y _i , i, j, J _i,j , _hi , D, c, K, t, p(t) and a(t) are This is similar to formulas (2) to (4). In formulas (9) and (10), n is an even number of 2 or more.

この場合、実施形態に係る計算装置１０は、ｔをゼロから十分大きな値へと微小時間ずつ増加させて、ｔ毎に、式（９）および式（１０）を用いてｘ_ｉおよびｙ_ｉを更新する。そして、実施形態に係る計算装置１０は、ｔを十分に大きくした場合のｘ_ｉの最終値の符号（±１）を、ｓ_ｉの最適解として出力する。 In this case, the computing device 10 according to the embodiment increments t from zero to a sufficiently large value in small time increments, and calculates x _i and y _i using equations (9) and (10) for each t. Update. Then, the computing device 10 according to the embodiment outputs the sign (±1) of the final value of x _i when t is sufficiently large as the optimum solution of s _i .

ここで、最も計算量が大きい係数行列Ｊに対する行列乗算は、式（９）に含まれ、式（１０）には含まれない。従って、式（９）および式（１０）を用いる場合も、実施形態に係る計算装置１０は、最も計算量が大きい係数行列Ｊに対する行列乗算を、ｙ_ｉの更新のためにのみ実行すればよく、ｘ_ｉの更新のためには実行しなくてよい。また、ｘ_ｉの時間微分値（ｄｘ_ｉ／ｄｔ）を算出するための式（９）は、ｐ（ｔ）が消去されている。従って、式（９）および式（１０）を用いる場合も、実施形態に係る計算装置１０は、少ない計算量でイジングモデルを用いた最適化問題の最適解を算出することができる。 Here, the matrix multiplication for the coefficient matrix J, which requires the largest amount of calculation, is included in Equation (9) and not included in Equation (10). Therefore, even when using equations (9) and (10), the calculation device 10 according to the embodiment only needs to perform matrix multiplication on the coefficient matrix J, which requires the largest amount of calculation, to update _yi . , _xi for updating. Also, p(t) is eliminated from the equation (9) for calculating the time differential value (dx _i /dt) of x _i . Therefore, even when Equations (9) and (10) are used, the computing device 10 according to the embodiment can calculate the optimum solution of the optimization problem using the Ising model with a small amount of calculation.

また、ｘ_ｉの時間微分値（ｄｘ_ｉ／ｄｔ）を算出するための式（９）は、ｙ_ｉを含むが、ｘ_ｉを含まない。また、ｙ_ｉの時間微分値（ｄｙ_ｉ／ｄｔ）を算出するための式（１０）は、ｘ_ｉを含むが、ｙ_ｉを含まない。 Also, Equation (9) for calculating the time differential value (dx _i /dt) of x _i includes y _i but does not include x _i . Also, Equation (10) for calculating the time differential value (dy _i /dt) of y _i includes x _i but does not include y _i .

つまり、式（９）および式（１０）を用いる場合も、ハミルトニアンにおいて、ｘ_ｉとｙ_ｉとは、互いに分離されている。従って、式（９）および式（１０）を用いる場合も、式（６）および式（７）を用いる場合と同様に、実施形態に係る計算装置１０は、簡易な演算および簡易な構成で、イジングモデルを用いた最適化問題の最適解を算出することができる。 That is, x _i and y _i are separated from each other in the Hamiltonian also when using equations (9) and (10). Therefore, when using formulas (9) and (10), as in the case of using formulas (6) and (7), the computing device 10 according to the embodiment can achieve Optimal solutions for optimization problems using Ising models can be calculated.

（第１実施形態）
第１実施形態に係る計算装置１０について説明する。なお、各実施形態の説明において、それまでに説明した装置、ブロックまたは回路等と、略同一の機能および構成を有する装置、ブロックまたは回路には同一の符号を付けて、相違点を除き詳細な説明を省略する。 (First embodiment)
A computing device 10 according to the first embodiment will be described. In the description of each embodiment, devices, blocks, or circuits having substantially the same functions and configurations as the devices, blocks, or circuits described so far are denoted by the same reference numerals, and detailed descriptions other than the differences are given. Description is omitted.

図１は、第１実施形態に係る計算装置１０の構成を示す図である。計算装置１０は、式（７）および式（８）に示した連立常微分方程式、または、式（９）および式（１０）に示した連立常微分方程式を用いて、イジングモデルを用いた最適化問題を解く装置である。 FIG. 1 is a diagram showing the configuration of a computing device 10 according to the first embodiment. Calculation device 10 calculates the optimum It is a device that solves the transformation problem.

計算装置１０は、演算部２０と、入力部２２と、出力部２４と、設定部２６とを備える。 The computing device 10 includes an arithmetic unit 20 , an input unit 22 , an output unit 24 and a setting unit 26 .

演算部２０は、例えば１または複数のＣＰＵ（Central Processing Unit）等のプロセッサおよびメモリを備える演算処理装置である。また、演算部２０は、第２実施形態以降の回路であってもよい。 The arithmetic unit 20 is an arithmetic processing device that includes a processor such as one or more CPUs (Central Processing Units) and a memory. Further, the calculation unit 20 may be the circuit of the second embodiment or later.

演算部２０は、サンプリング時刻を表すパラメータであるｔを開始時刻（例えば０）から微小時間（ｄｔ）ずつ増加させる。演算部２０は、それぞれのサンプリング時刻毎に、式（６）および式（７）に示した連立常微分方程式、または、式（９）および式（１０）に示した連立常微分方程式を用いて、Ｎ個の第１変数ｘ_ｉおよびＮ個の第２変数ｙ_ｉを更新する。そして、演算部２０は、予め定められたサンプリング時刻である終了時刻ＴにおけるＮ個の第１変数ｘ_ｉを出力する。 The calculation unit 20 increases t, which is a parameter representing the sampling time, by minute time (dt) from the start time (for example, 0). At each sampling time, the calculation unit 20 uses the simultaneous ordinary differential equations shown in equations (6) and (7), or the simultaneous ordinary differential equations shown in equations (9) and (10). , N first variables x _i and N second variables y _i are updated. Then, the calculation unit 20 outputs N first variables x _i at the end time T, which is a predetermined sampling time.

入力部２２は、演算部２０による演算処理に先だって、開始時刻におけるＮ個の第１変数ｘ_ｉおよびＮ個の第２変数ｙ_ｉを取得して、演算部２０に与える。開始時刻におけるＮ個の第１変数ｘ_ｉおよびＮ個の第２変数ｙ_ｉは、例えば、乱数により発生された値であってもよいし、予め設定された値（例えば、全て０または全て所定値）であってもよい。 The input unit 22 acquires the N first variables x _i and the N second variables y _i at the start time prior to the arithmetic processing by the arithmetic unit 20 and gives them to the arithmetic unit 20 . The N first variables x _i and the N second variables y _i at the start time may be, for example, values generated by random numbers, or preset values (for example, all 0 or all predetermined value).

出力部２４は、演算部２０による演算処理の終了後に、終了時刻ＴにおけるＮ個の第１変数ｘ_ｉを取得する。そして、出力部２４は、終了時刻ＴにおけるＮ個の第１変数ｘ_ｉの符号を表す値（例えば、＋１，－１）を、イジングモデルにおけるＮ個のスピンの状態の組み合わせの最適解として出力する。 The output unit 24 acquires the N first variables x _i at the end time T after the arithmetic processing by the arithmetic unit 20 is completed. Then, the output unit 24 outputs a value (eg, +1, −1) representing the sign of the N first variables x _i at the end time T as the optimum solution for the combination of the states of the N spins in the Ising model. do.

設定部２６は、演算部２０による演算処理に先だって、式（６）および式（７）に示した連立常微分方程式、および、式（９）および式（１０）に示した連立常微分方程式に用いられる各パラメータを演算部２０に対して設定する。より具体的には、設定部２６は、Ｎ、Ｊ、ｈ、Ｄ、ｃ、Ｋ、ｐ（ｔ）およびａ（ｔ）を設定する。 Before the arithmetic processing by the arithmetic unit 20, the setting unit 26 sets the simultaneous ordinary differential equations shown in the equations (6) and (7) and the simultaneous ordinary differential equations shown in the equations (9) and (10). Each parameter to be used is set for the calculation unit 20 . More specifically, the setting unit 26 sets N, J, h, D, c, K, p(t) and a(t).

Ｎは、第１変数および第２変数の数を表す２以上の整数である。
Ｊは、Ｎ行×Ｎ列の係数行列である。Ｊ_ｉ，ｊは、係数行列Ｊに含まれるｉ行、ｊ列の係数である。
ｈは、Ｎ個の係数を含む係数配列である。ｈ_ｉは、係数配列ｈにおけるｉ番目の係数である。
Ｄ、ｃ、Ｋは、定数である。
ｐ（ｔ）は、サンプリング時刻を変数とする関数である。
ａ（ｔ）は、サンプリング時刻を変数とする関数である。 N is an integer of 2 or more representing the number of first variables and second variables.
J is a coefficient matrix of N rows×N columns. J _i,j is the coefficient in the i-th row and the j-th column included in the coefficient matrix J.
h is a coefficient array containing N coefficients. h _i is the ith coefficient in the coefficient array h.
D, c, K are constants.
p(t) is a function with the sampling time as a variable.
a(t) is a function with the sampling time as a variable.

さらに、設定部２６は、ｄｔ、ＴおよびＭを設定してもよい。
ｄｔは、サンプリング期間（微小時間）を表す定数である。
Ｔは、終了時刻に相当するサンプリング時刻を表す定数である。
Ｍは、式（６）および式（７）の演算の繰り返し数、または、式（９）および式（１０）の演算の繰り返し数を表す１以上の整数である。 Further, the setting unit 26 may set dt, T and M.
dt is a constant representing a sampling period (minute time).
T is a constant representing the sampling time corresponding to the end time.
M is an integer of 1 or more representing the number of repetitions of the calculations of formulas (6) and (7) or the number of repetitions of the calculations of formulas (9) and (10).

なお、設定部２６は、これらのパラメータのうち一部を、ユーザ操作に応じて任意に変更してもよい。また、設定部２６は、これらのパラメータを演算毎に変更せずに固定した値としてもよい。 Note that the setting unit 26 may arbitrarily change some of these parameters according to user operations. Also, the setting unit 26 may set these parameters to fixed values without changing them for each calculation.

図２は、演算部２０の処理の流れを示すフローチャートである。 FIG. 2 is a flow chart showing the processing flow of the computing unit 20. As shown in FIG.

まず、Ｓ１１において、演算部２０は、ｔ、ｐおよびａを初期化する。例えば、演算部２０は、ｔ、ｐおよびａを全て０にする。ｔは、サンプリング時刻を表すパラメータである。ｐは、時刻ｔにおけるｐ（ｔ）の値を表すパラメータである。ａは、時刻ｔにおけるａ（ｔ）の値を表すパラメータである。 First, in S11, the calculation unit 20 initializes t, p and a. For example, the computing unit 20 sets t, p and a to all zero. t is a parameter representing sampling time. p is a parameter representing the value of p(t) at time t. a is a parameter representing the value of a(t) at time t.

続いて、演算部２０は、ｔが予め設定された終了時刻Ｔ以上となるまでＳ１３からＳ２０までの処理を繰り返す（Ｓ１２とＳ２１との間のループ処理）。なお、ａ（ｔ）が増加関数である場合、演算部２０は、ａが予め定められた値以上となるまで、Ｓ１３からＳ２０までの間の処理を繰り返してもよい。 Subsequently, the calculation unit 20 repeats the processing from S13 to S20 until t becomes equal to or greater than the preset end time T (loop processing between S12 and S21). Note that when a(t) is an increasing function, the calculation unit 20 may repeat the processing from S13 to S20 until a becomes equal to or greater than a predetermined value.

Ｓ１３において、演算部２０は、Ｎ個の第１変数ｘ_ｉのそれぞれに、サンプリング期間（微小時間）ｄｔと、予め設定された係数ｃとを乗じることにより、Ｎ個の第１中間変数ｘ_ｉ´を算出する。すなわち、Ｓ１３において、演算部２０は、下記の式（２１）の演算を実行する。
ｘ_ｉ´＝ｄｔ×ｃ×ｘ_ｉ…（２１） In S13, the calculation unit 20 multiplies each of the N first variables x _i by the sampling period (minute time) dt and a preset coefficient c to obtain N first intermediate variables x _i ' is calculated. That is, in S13, the calculation unit 20 executes the calculation of the following formula (21).
x _i ′=dt×c×x _i (21)

続いて、Ｓ１４において、演算部２０は、Ｎ個の第１中間変数ｘ_ｉ´と、予め設定されたＮ行×Ｎ列の係数を含む係数行列Ｊとを行列乗算することにより、Ｎ個の第２中間変数ｂ_ｉを算出する。すなわち、Ｓ１４において、演算部２０は、下記の式（２２）の演算を実行する。

Subsequently, in S14, the calculation unit 20 multiplies the N first intermediate variables x _i ' by a coefficient matrix J including preset coefficients of N rows×N columns to obtain N A second intermediate variable b _i is calculated. That is, in S14, the calculation unit 20 executes the calculation of the following formula (22).

なお、演算部２０は、Ｓ１４の処理を実行した後、Ｓ１３の処理を実行してもよい。この場合、Ｓ１４において、演算部２０は、Ｎ個の第１変数ｘ_ｉと係数行列Ｊとを行列乗算することにより、Ｎ個の値を算出する。続いて、Ｓ１３において、演算部２０は、Ｓ１４で算出したＮ個の値のそれぞれに、（ｄｔ×ｃ）を乗じることにより、Ｎ個の第２中間変数ｂ_ｉを算出する。 Note that the calculation unit 20 may execute the process of S13 after executing the process of S14. In this case, in S14, the calculation unit 20 performs matrix multiplication of the N first variables _xi and the coefficient matrix J to calculate N values. Subsequently, in S13, the calculation unit 20 calculates N second intermediate variables b _i by multiplying each of the N values calculated in S14 by (dt×c).

続いて、Ｓ１５において、演算部２０は、Ｎ個の第２変数ｙ_ｉのそれぞれに、対応する第２中間変数ｂ_ｉを加算することにより、Ｎ個の第２変数ｙ_ｉを更新する。すなわち、Ｓ１５において、演算部２０は、下記の式（２３）の演算を実行する。
ｙ_ｉ＋＝ｂ_ｉ…（２３） Subsequently, in S15, the calculation unit 20 updates the N second variables _yi by adding the corresponding second intermediate variables _bi to each of the N second variables _yi . That is, in S15, the calculation unit 20 executes the calculation of the following formula (23).
y _i +=b _i (23)

続いて、演算部２０は、Ｓ１７からＳ１８までの処理を、Ｍ回繰り返して実行する（Ｓ１６とＳ１９との間のループ処理）。なお、Ｍは、１以上の整数である。 Subsequently, the calculation unit 20 repeats the processing from S17 to S18 M times (loop processing between S16 and S19). Note that M is an integer of 1 or more.

Ｓ１７において、演算部２０は、式（７）または式（１０）に従った演算を実行することにより、第２変数ｙ_ｉを更新する。例えば、Ｓ１７において、演算部２０は、式（７）に従った演算を実行する場合、下記の式（２４）の演算を実行する。
ｙ_ｉ＋＝ｄｔ´×［（－Ｄ＋ｐ－Ｋｘ_ｉ ^２）ｘ_ｉ－ｃ×ｈ_ｉ×ａ］…（２４） In S17, the computation unit 20 updates the second variable _yi by performing computation according to Equation (7) or Equation (10). For example, in S17, the calculation unit 20 executes the calculation of the following formula (24) when executing the calculation according to the formula (7).
y _i +=dt′×[(−D+p−Kx _i ² )x _i −c×h _i ×a] (24)

また、例えば、Ｓ１７において、演算部２０は、式（１０）に従った演算を実行する場合、下記の式（２５）の演算を実行する。なお、式（２５）において、ｎは、２以上の偶数である。
ｙ_ｉ＋＝ｄｔ´×｛［（－Ｄ＋ｐ）（１＋ｘ_ｉ ^ｎ）－Ｋｘ_ｉ ^ｎ＋２］ｘ_ｉ－ｃ×ｈ_ｉ×ａ｝…（２５） Further, for example, in S17, the calculation unit 20 executes the calculation of the following formula (25) when executing the calculation according to the formula (10). In addition, in Formula (25), n is an even number of 2 or more.
y _i +=dt′×{[(−D+p)(1+x _i ⁿ )−K _{x i} ⁿ⁺² ]x _i −c×h _i ×a} (25)

続いて、Ｓ１８において、演算部２０は、式（６）または式（９）に従った演算を実行することにより、第１変数ｘ_ｉを更新する。なお、式（６）および式（９）は、同一の式である。例えば、Ｓ１８において、演算部２０は、式（６）または式（９）に従った演算を実行する場合、下記の式（２６）の演算を実行する。
ｘ_ｉ＋＝ｄｔ´×Ｄ×ｙ_ｉ…（２６） Subsequently, in S18, the calculation unit 20 updates the first variable _xi by executing the calculation according to Equation (6) or Equation (9). Note that equations (6) and (9) are the same equation. For example, in S18, the calculation unit 20 executes the calculation of the following formula (26) when executing the calculation according to the formula (6) or the formula (9).
x _i +=dt′×D×y _i (26)

Ｓ１６からＳ１９までのループ処理は、シンプレクティック・オイラー法における繰り返し演算に対応する。なお、演算部２０は、Ｓ１７の処理とＳ１８の処理を逆に実行してもよい。すなわち、演算部２０は、第１変数ｘ_ｉを更新した後に、第２変数ｙ_ｉを更新してもよい。 The loop processing from S16 to S19 corresponds to repeated calculations in the symplectic Euler method. Note that the calculation unit 20 may perform the processing of S17 and the processing of S18 in reverse. That is, the calculation unit 20 may update the second variable _yi after updating the first variable _xi .

Ｓ１６からＳ１９までのループ処理を実行した後、演算部２０は、処理をＳ２０に進める。Ｓ２０において、演算部２０は、ｔにｄｔを加算することにより、ｔを更新する。さらに、演算部２０は、ｐおよびａを更新する。例えば、演算部２０は、ｐに、予め設定されたｄｐを加算することにより、ｐを更新する。また、演算部２０は、更新後のｐの平方根を演算することにより、ａを更新する。 After executing the loop processing from S16 to S19, the calculation unit 20 advances the processing to S20. In S20, the calculation unit 20 updates t by adding dt to t. Furthermore, the calculation unit 20 updates p and a. For example, the calculation unit 20 updates p by adding a preset dp to p. Further, the calculation unit 20 updates a by calculating the square root of p after updating.

Ｓ２０の処理を実行した後、演算部２０は、ｔがＴ以上となったか否かを判断する。演算部２０は、ｔがＴより小さい場合には、処理をＳ１３に戻して、Ｓ１３から処理を繰り返す。ｔがＴ以上となった場合には、演算部２０は、本フローを終了する。 After executing the process of S20, the calculation unit 20 determines whether or not t has become T or more. If t is smaller than T, the calculation unit 20 returns the process to S13 and repeats the process from S13. When t becomes equal to or greater than T, the calculation unit 20 terminates this flow.

以上のような処理を実行することにより、第１実施形態に係る計算装置１０は、式（６）および式（７）に示す連立常微分方程式、または、式（９）および式（１０）に示す連立常微分方程式を用いて、最適化問題を解くことができる。第１実施形態に係る計算装置１０によれば、第１変数ｘ_ｉおよび第２変数ｙ_ｉを簡単な演算または簡単な構成で、高速に更新することができる。従って、第１実施形態に係る計算装置１０によれば、小さいコストで、高速に最適化問題の最適解を算出することができる。 By executing the processing as described above, the computing device 10 according to the first embodiment converts the simultaneous ordinary differential equations represented by the equations (6) and (7), or the equations (9) and (10) into The optimization problem can be solved using the system of ordinary differential equations shown. According to the computing device 10 according to the first embodiment, the first variable x _i and the second variable y _i can be updated at high speed with a simple calculation or a simple configuration. Therefore, according to the computing device 10 according to the first embodiment, it is possible to calculate the optimum solution of the optimization problem at low cost and at high speed.

（第２実施形態）
第２実施形態に係る計算装置１０について説明する。 (Second embodiment)
A computing device 10 according to the second embodiment will be described.

図３は、第２実施形態に係る演算部２０の構成を示す図である。第２実施形態において、計算装置１０は、１または複数の半導体装置により実現された回路である。計算装置１０は、例えば、ＦＰＧＡ（Field Programmable Gate Array）、ゲートアレイまたは特定用途向け集積回路（ＡＳＩＣ）等であってもよい。また、計算装置１０は、一部にプロセッサを含んでもよい。 FIG. 3 is a diagram showing the configuration of the calculation unit 20 according to the second embodiment. In the second embodiment, computing device 10 is a circuit realized by one or more semiconductor devices. Computing device 10 may be, for example, an FPGA (Field Programmable Gate Array), a gate array, an application specific integrated circuit (ASIC), or the like. Also, the computing device 10 may partially include a processor.

第２実施形態に係る演算部２０は、行列乗算部２８と、時間発展部３０と、管理部３２とを備える。 The calculation unit 20 according to the second embodiment includes a matrix multiplication unit 28, a time evolution unit 30, and a management unit 32.

行列乗算部２８は、任意のサンプリング時刻を表す第１時刻ｔ_１におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_１）を時間発展部３０から取得する。行列乗算部２８は、第１時刻ｔ_１におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_１）と、係数行列Ｊとを行列乗算することにより、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を算出する。 The matrix multiplication unit 28 acquires from the time evolution unit 30 N first intermediate variables x _i '(t ₁ ) at a first time t ₁ representing an arbitrary sampling time. The matrix multiplication unit 28 performs matrix multiplication of the N first intermediate variables x _i '(t ₁ ) at the first time t ₁ and the coefficient matrix _J to obtain N second Compute the intermediate variables b _i (t ₁ ).

例えば、行列乗算部２８は、係数行列メモリ３６と、行列乗算実行部３８とを有する。係数行列メモリ３６は、係数行列Ｊを記憶する。行列乗算実行部３８は、第１時刻ｔ_１におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_１）と、係数行列Ｊとを行列乗算する。 For example, the matrix multiplication section 28 has a coefficient matrix memory 36 and a matrix multiplication execution section 38 . A coefficient matrix memory 36 stores a coefficient matrix J. FIG. The matrix multiplication execution unit 38 performs matrix multiplication of the coefficient matrix J and the N first intermediate variables x _i '(t ₁ ) at the first time t ₁ .

時間発展部３０は、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を行列乗算部２８から取得する。時間発展部３０は、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）に基づき、第１時刻ｔ_１から１サンプリング期間後のサンプリング時刻を表す第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）、第２時刻ｔ_２におけるＮ個の第２変数ｙ_ｉ（ｔ_２）および第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）を算出する。 The time evolution unit 30 acquires the N second intermediate variables b _i (t ₁ ) at the first time t ₁ from the matrix multiplication unit 28 . Based on _the N second intermediate variables b _i (t ₁ ) at the first time t ₁ , the time evolution unit ₃₀ calculates N first variables x _i (t ₂ ), N second variables y _i (t ₂ ) at the second time t ₂ and N first intermediate variables x _i ′(t ₂ ) at the second time t ₂ ) is calculated.

管理部３２は、開始時刻からサンプリング期間毎にサンプリング時刻を増加させる。そして、管理部３２は、それぞれのサンプリング時刻に対する処理を、行列乗算部２８および時間発展部３０に実行させる。 The management unit 32 increases the sampling time for each sampling period from the start time. Then, the management unit 32 causes the matrix multiplication unit 28 and the time evolution unit 30 to execute processing for each sampling time.

すなわち、管理部３２は、例えば、行列乗算部２８に、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を算出させる。続いて、管理部３２は、時間発展部３０に、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）、Ｎ個の第２変数ｙ_ｉ（ｔ_２）およびＮ個の第１中間変数ｘ_ｉ´（ｔ_２）を算出させる。続いて、管理部３２は、行列乗算部２８に、第２時刻ｔ_２におけるＮ個の第２中間変数ｂ_ｉ（ｔ_２）を算出させる。続いて、管理部３２は、時間発展部３０に、第２時刻ｔ_２から１サンプリング期間後のサンプリング時刻を表す第３時刻ｔ_３におけるＮ個の第１変数ｘ_ｉ（ｔ_３）、Ｎ個の第２変数ｙ_ｉ（ｔ_３）およびＮ個の第１中間変数ｘ_ｉ´（ｔ_３）を算出させる。このように、管理部３２は、サンプリング時刻を増加させながら、行列乗算部２８および時間発展部３０に交互に処理を実行させる。 That is, the management unit 32 causes, for example, the matrix multiplication unit 28 to calculate N second intermediate variables b _i (t ₁ ) at the first time t ₁ . Subsequently, the management unit 32 provides the time evolution unit 30 with N first variables x _i (t ₂ ), N second variables y i (t ₂ ), and N second variables y _i (t ₂ ) at the second time t 2 . 1 Intermediate variables x _i '(t ₂ ) are calculated. Subsequently, the management unit 32 causes the matrix multiplication unit 28 to calculate N second intermediate variables b _i (t ₂ ) at the second time t ₂ . Subsequently, the management unit 32 provides the time evolution unit 30 with N first variables x _i (t ₃ ) at a third time t ₃ representing a sampling time after one sampling period from the second time t ₂ , N second variables y _i (t ₃ ) and N first intermediate variables x _i '(t ₃ ). In this way, the management unit 32 causes the matrix multiplication unit 28 and the time evolution unit 30 to alternately execute processing while increasing the sampling time.

なお、開始時刻（例えば、ｔ_０）におけるＮ個の第１変数ｘ_ｉ（ｔ_０）およびＮ個の第２変数ｙ_ｉ（ｔ_０）は、例えば、演算処理に先だって、入力部２２により予め与えられる。 Note that the N first variables x _i (t ₀ ) and the N second variables y _i (t ₀ ) at the start time (for example, t ₀ ) are previously calculated by the input unit 22, for example, prior to the arithmetic processing. Given.

例えば、時間発展部３０は、第１変数メモリ４０と、第２変数メモリ４２と、第１加算部４４と、関数演算部４６と、第１乗算部４８と、第１中間変数メモリ５０とを有する。 For example, the time evolution unit 30 includes a first variable memory 40, a second variable memory 42, a first addition unit 44, a function operation unit 46, a first multiplication unit 48, and a first intermediate variable memory 50. have.

第１変数メモリ４０は、第１時刻ｔ_１におけるＮ個の第１変数ｘ_ｉ（ｔ_１）を記憶する。第２変数メモリ４２は、第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）を記憶する。 The first variable memory 40 stores N first variables x _i (t ₁ ) at the first time t ₁ . The second variable memory 42 stores N second variables y _i (t ₁ ) at the first time t ₁ .

第１加算部４４は、行列乗算部２８が算出した第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を取得する。第１加算部４４は、第２変数メモリ４２に記憶された第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）に、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を加算することにより、第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）を更新する。例えば、第１加算部４４は、先頭のインデックス（ｉ＝０）の第２変数ｙ_０（ｔ_１）から、最後のインデックス（ｉ＝Ｎ－１）の第２変数ｙ_Ｎ－１（ｔ_１）まで、インデックス順に、Ｎ個の第２変数ｙ_ｉ（ｔ_１）を更新する。 The first adder 44 acquires the N second intermediate variables b _i (t ₁ ) at the first time t ₁ calculated by the matrix multiplier 28 . _The first _adder ₄₄ adds the N second intermediate _variables b Update the N second variables _{y i} ₍ t ₁ ) at the first time t ₁ by adding i (t ₁ ). For example, the first adder 44 converts the second variable y ₀ (t ₁ ) of the top _index (i=0) to the second variable y N−1 (t 1 ) of the last index (i=N ₋ 1). ), update the N second variables y _i (t ₁ ) in index order.

なお、実施形態において、Ｎ個の第１値とＮ個の第２値とを加算するとは、同一のインデックスの値同士を加算することにより、Ｎ個の第３値を生成することをいう。 In the embodiment, adding N first values and N second values means generating N third values by adding values of the same index.

関数演算部４６は、第１変数メモリ４０に記憶された第１時刻ｔ_１におけるＮ個の第１変数ｘ_ｉ（ｔ_１）、および、第１加算部４４が算出した第１時刻ｔ_１における更新されたＮ個の第２変数ｙ_ｉ（ｔ_１）に基づき、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）および第２時刻ｔ_２におけるＮ個の第２変数ｙ_ｉ（ｔ_２）を算出する。例えば、関数演算部４６は、先頭のインデックス（ｉ＝０）の第１変数ｘ_０（ｔ_２）および第２変数ｙ_０（ｔ_２）から、最後のインデックス（ｉ＝Ｎ－１）の第１変数ｘ_Ｎ－１（ｔ_２）および第２変数ｙ_Ｎ－１（ｔ_２）まで、インデックス順に、Ｎ個の第１変数ｘ_ｉ（ｔ_２）およびＮ個の第２変数ｙ_ｉ（ｔ_２）を算出する。 The function calculation unit 46 calculates the N first variables x _i (t ₁ ) at the first time t ₁ stored in the first variable memory ₄₀ and N first variables x _i (t ₂ ) at the second time t ₂ and N second variables _y at the second time t ₂ based on the updated N second variables y i (t ₁ ) Calculate _i (t ₂ ). For example, the function operation unit 46 converts the first variable x ₀ (t ₂ ) and the second variable y ₀ (t ₂ ) of the top index (i=0) to the last index (i=N-1). N first variables x _i (t ₂ ) and N second variables y i (t ₂ ), in _index order, up to one variable x _N−1 (t ₂ ) and a second variable y N−1 (t 2 ₎ ₂ ) is calculated.

関数演算部４６は、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）を第１変数メモリ４０に書き込む。例えば、関数演算部４６は、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）のそれぞれを、先頭のインデックス（ｉ＝０）の第１変数ｘ_０（ｔ_２）から順次に第１変数メモリ４０に書き込む。第１変数メモリ４０は、例えば、デュアルポートメモリであり、あるアドレスのデータの読み出しをしながら、他のアドレスにデータを書き込むことができる。第１変数メモリ４０がデュアルポートメモリである場合、関数演算部４６は、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）を、第１時刻ｔ_１におけるＮ個の第１変数ｘ_ｉ（ｔ_１）が記憶されていたアドレスに上書きすることができる。 The function calculator 46 writes the N first variables x _i (t ₂ ) at the second time t ₂ to the first variable memory 40 . For example, the function operation unit 46 sequentially calculates each of the N first variables x _i (t ₂ ) at the second time t ₂ from the first variable x ₀ (t ₂ ) of the leading index (i=0). to the first variable memory 40. The first variable memory 40 is, for example, a dual port memory, and can write data to another address while reading data from a certain address. When the first variable memory 40 is a dual port memory, the function operation unit 46 converts the N first variables x i (t 2 ) at the second time t ₂ to the N first variables x _i (t ₂ ) at the first time t ₁ . The address at which the variable x _i (t ₁ ) was stored can be overwritten.

関数演算部４６は、第２時刻ｔ_２におけるＮ個の第２変数ｙ_ｉ（ｔ_２）を第２変数メモリ４２に書き込む。例えば、関数演算部４６は、第２時刻ｔ_２におけるＮ個の第２変数ｙ_ｉ（ｔ_２）のそれぞれを、先頭のインデックスの第２変数ｙ_０（ｔ_２）から順次に第２変数メモリ４２に書き込む。第２変数メモリ４２は、例えば、デュアルポートメモリである。第２変数メモリ４２がデュアルポートメモリである場合、関数演算部４６は、第２時刻ｔ_２におけるＮ個の第２変数ｙ_ｉ（ｔ_２）を、第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）が記憶されていたアドレスに上書きすることができる。 The function calculator 46 writes the N second variables y _i (t ₂ ) at the second time t ₂ to the second variable memory 42 . For example, the function operation unit 46 stores each of the N second variables y _i (t ₂ ) at the second time t ₂ in the second variable memory sequentially from the second variable y ₀ (t ₂ ) at the top index. Write to 42. The second variable memory 42 is, for example, a dual port memory. When the second variable memory 42 is a dual port memory, the function operation unit 46 converts the N second variables y i (t 2 ) at the second time t ₂ to the N second variables y _i (t ₂ ) at the first time t _{1 .} The address at which the variable y _i (t ₁ ) was stored can be overwritten.

第１乗算部４８は、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）のそれぞれに、予め設定された値（本実施形態においては、（ｄｔ×ｃ））を乗じることにより、第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ ^´（ｔ_２）を算出する。第１中間変数メモリ５０は、第１乗算部４８が算出した第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）を記憶する。なお、第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）は、第１中間変数メモリ５０に一時的に記憶された後に、行列乗算部２８に送信される。第１中間変数メモリ５０は、第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）のそれぞれを、第１乗算部４８により生成されてから、行列乗算部２８へと送信されるまでの期間保持する。第１中間変数メモリ５０は、例えば、ＦＩＦＯ（First IN First Out）メモリを含んでいてもよい。 The first multiplier 48 multiplies each of the N first variables x _i (t ₂ ) at the second time t ₂ by a preset value ((dt×c) in this embodiment). to calculate N first intermediate variables x _i ^' (t ₂ ) at the second time t ₂ . The first intermediate variable memory 50 stores N first intermediate variables x _i '(t ₂ ) at the second time t ₂ calculated by the first multiplier 48 . Note that the N first intermediate variables x _i '(t ₂ ) at the second time t ₂ are temporarily stored in the first intermediate variable memory 50 and then sent to the matrix multiplier 28 . The first intermediate variable memory 50 transmits each of the N first intermediate variables x _i '(t ₂ ) at the second time t ₂ to the matrix multiplier 28 after being generated by the first multiplier 48 . retained until The first intermediate variable memory 50 may include, for example, a FIFO (First In First Out) memory.

このような構成において、行列乗算部２８は、第１実施形態において説明したＳ１４の処理を実行する。また、第１加算部４４は、Ｓ１５の処理を実行する。関数演算部４６は、Ｓ１６からＳ１９までの処理を実行する。また、第１乗算部４８は、Ｓ１３の処理を実行する。また、管理部３２は、Ｓ１１、Ｓ２０、および、Ｓ１２とＳ２１との間のループ処理の管理を実行する。従って、第２実施形態に係る演算部２０によれば、第１実施形態と同様に、小さいコストで高速に最適化問題の最適解を算出することができる。 With such a configuration, the matrix multiplication unit 28 executes the processing of S14 described in the first embodiment. Also, the first addition unit 44 executes the process of S15. The function calculator 46 executes the processes from S16 to S19. Also, the first multiplier 48 executes the process of S13. The management unit 32 also manages S11, S20, and loop processing between S12 and S21. Therefore, according to the calculation unit 20 according to the second embodiment, the optimum solution of the optimization problem can be calculated at low cost and at high speed, as in the first embodiment.

なお、第１乗算部４８は、第１中間変数メモリ５０の前段に代えて、第１加算部４４の前段に設けられてもよい。この配置の変更は、第１実施形態のＳ１３とＳ１４との処理を逆にした処理に対応する。ただし、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）のそれぞれに乗算する値（本実施形態においては（ｄｔ×ｃ））が１より小さい場合には、第１乗算部４８は、第１中間変数メモリ５０の前段に設けられている方がよい。これにより、第１乗算部４８は、第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）の桁数を小さくして、行列乗算部２８でのオーバフローの確率を少なくすることができる。 Note that the first multiplication section 48 may be provided before the first addition section 44 instead of before the first intermediate variable memory 50 . This arrangement change corresponds to the process in which the processes of S13 and S14 in the first embodiment are reversed. However, when the value ((dt× _c ) in this embodiment) to be multiplied by each of the N first variables x _i (t ₂ ) at the second time t 2 is smaller than 1, the first multiplier 48 is preferably provided in the preceding stage of the first intermediate variable memory 50 . As a result, the first multiplier 48 reduces the number of digits of the N first intermediate variables x _i '(t ₂ ) at the second time t ₂ to reduce the probability of overflow in the matrix multiplier 28. be able to.

第１乗算部４８が第１加算部４４の前段に設けられた場合、第１中間変数メモリ５０は、第２時刻ｔ_２におけるＮ個の第１変数ｘ_ｉ（ｔ_２）を、第２時刻ｔ_２におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_２）として記憶する。また、この場合、第１乗算部４８は、行列乗算部２８から出力された第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）のそれぞれに、予め設定された値（ｄｔ×ｃ）を乗じる。そして、この場合、第１加算部４４は、第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）に、第１乗算部４８が予め設定された値（ｄｔ×ｃ）を乗じた第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を加算することにより、第１時刻ｔ_１におけるＮ個の第２変数ｙ_ｉ（ｔ_１）を更新する。 When the first multiplication unit 48 is provided before the first addition unit ₄₄ , the first intermediate variable memory 50 stores the N first variables x _i (t ₂ ) at the second time t 2 as Store as N first intermediate variables x _i '(t ₂ ) at t ₂ . Also, in this case _, the first multiplier 48 _assigns preset values ( _dt xc). In this case, the first addition unit 44 multiplies the N second variables y _i (t ₁ ) at the first time t ₁ by the value (dt×c) preset by the first multiplication unit 48 . N second variables y i ₍ t ₁ ) at the first time t ₁ are updated by adding the N second intermediate variables b _i (t ₁ ) at the first time t ₁ .

また、時間発展部３０は、第１時刻ｔ_１におけるＮ個の第１中間変数ｘ_ｉ´（ｔ_１）を、１クロックサイクルに第１数（第１数は１以上の整数）の第１中間変数ｘ_ｉ´（ｔ_１）を含む第１中間ストリームＸ´として行列乗算部２８へと出力してもよい。また、行列乗算部２８は、第１時刻ｔ_１におけるＮ個の第２中間変数ｂ_ｉ（ｔ_１）を、１クロックサイクルに第２数（第２数は１以上の整数）の第２中間変数ｂ_ｉ（ｔ_１）を含む第２中間ストリームＢとして時間発展部３０へと出力してもよい。 Also, the time evolution unit 30 converts the N first intermediate variables x _i '(t ₁ ) at the first time t ₁ to a first number (the first number is an integer equal to or greater than 1) in one clock cycle. It may be output to the matrix multiplier 28 as a first intermediate stream X' containing intermediate variables x _i '(t ₁ ). Also, the matrix multiplication unit 28 multiplies the N second intermediate variables b _i (t ₁ ) at the first time t ₁ by the second intermediate variables of a second number (the second number is an integer equal to or greater than 1) in one clock cycle. It may be output to the time evolution unit 30 as a second intermediate stream B including variables b _i (t ₁ ).

ここで、ストリームとは、時系列データをいう。より具体的には、ストリームとは、Ｎ個のデータを例えばＰ個（Ｐは１以上の整数）ずつのデータセットに分割し、１クロックサイクル毎に、Ｐ個のデータセットを含むデータ列をいう。 Here, a stream refers to time-series data. More specifically, a stream divides N pieces of data into, for example, P pieces of data sets each (P is an integer of 1 or more), and a data string containing P pieces of data sets is generated for each clock cycle. say.

行列乗算部２８および時間発展部３０は、このようなストリームを取得した場合、先に取得したデータセットから順に処理を実行する。これにより、行列乗算部２８および時間発展部３０は、ストリームに含まれる全てのデータの取得が完了する前に、処理を開始することができる。また、行列乗算部２８および時間発展部３０は、Ｎ個のデータの先頭のデータセットの算出が完了した場合、算出が完了したデータセットから順次に送信を開始する。これにより、行列乗算部２８および時間発展部３０は、全てのデータの算出が完了する前に、次のユニットに処理を開始させることができる。 When acquiring such a stream, the matrix multiplying unit 28 and the time evolution unit 30 sequentially process the data sets acquired earlier. This allows the matrix multiplication unit 28 and the time evolution unit 30 to start processing before all the data included in the stream have been acquired. Also, when the calculation of the leading data set of the N pieces of data is completed, the matrix multiplication unit 28 and the time evolution unit 30 sequentially start transmission from the data set for which the calculation has been completed. This allows the matrix multiplication section 28 and the time evolution section 30 to cause the next unit to start processing before the calculation of all data is completed.

図４は、第２実施形態でのＮ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉの関係を示す図である。行列乗算部２８は、サンプリング時刻毎に、Ｎ個の第１中間変数（ｘ_０´，ｘ_１´，ｘ_２´，…，ｘ_ｉ´，…，ｘ_Ｎ－１´）を取得する。また、行列乗算部２８は、Ｎ行×Ｎ列の係数を含む係数行列（Ｊ_０，０，Ｊ_０，１，Ｊ_０，２，…，Ｊ_ｉ，ｊ，…，Ｊ_{Ｎ－１，Ｎ－１}）を記憶する。 FIG. 4 is a diagram showing the relationship among the N first intermediate variables x _i ', the coefficient matrix J, and the N second intermediate variables b _i in the second embodiment. The matrix multiplication unit 28 acquires N first intermediate variables (x ₀ ′, x _{1 ′} , x ₂ ′, . . . , x _i ', . . . , x _N-1 ') at each sampling time. The matrix multiplication unit 28 also generates a coefficient matrix (J _0,0 , J _0,1 , J _0,2 , . . . , J _i ,j _{, . -1} ).

そして、行列乗算部２８は、Ｎ個の第１中間変数（ｘ_０´，ｘ_１´，ｘ_２´，…，ｘ_ｉ´，…，ｘ_Ｎ－１´）と、係数行列（Ｊ_０，０，Ｊ_０，１，Ｊ_０，２，…，Ｊ_ｉ，ｊ，…，Ｊ_{Ｎ－１，Ｎ－１}）とを行列乗算することにより、Ｎ個の第２中間変数（ｂ_０，ｂ_１，ｂ_２，…，ｂ_ｉ，…，ｂ_Ｎ－１）を算出する。 Then, the matrix multiplication unit 28 generates the N first intermediate variables (x ₀ ', x ₁ ', x ₂ ', ..., x _i ', ..., x _N-1 ') and the coefficient matrix (J _{0, 0} , J _0,1 , J _0,2 , _{. .} _. , J i _{,j ,} . ₁ , b ₂ , . _. . , b _i , .

第２実施形態において、行列乗算部２８は、どのような処理により行列乗算を実行してもよい。また、第２実施形態において、行列乗算部２８は、例えば内部にプロセッサ等を含み、プログラムにより行列乗算を実行してもよい。 In the second embodiment, the matrix multiplication unit 28 may perform matrix multiplication by any process. Further, in the second embodiment, the matrix multiplication unit 28 may include, for example, a processor inside, and execute matrix multiplication by a program.

図５は、関数演算部４６の構成の第１例を示す図である。第１例に係る関数演算部４６は、第１のＦＸ演算部５１－１と、第１のＦＸ加算部５２－１と、第１のＦＹ演算部５３－１と、第１のＦＹ加算部５４－１とを有する。 FIG. 5 is a diagram showing a first example of the configuration of the function calculator 46. As shown in FIG. The function calculation unit 46 according to the first example includes a first FX calculation unit 51-1, a first FX addition unit 52-1, a first FY calculation unit 53-1, and a first FY addition unit. 54-1.

第１のＦＸ演算部５１－１は、第１時刻におけるＮ個の第１変数のそれぞれに対して、第１関数演算（ＦＸ（ｘ_ｉ））をすることにより、Ｎ個の第２微分値を算出する。第１のＦＸ加算部５２－１は、第１のＦＸ演算部５１－１が算出したＮ個の第２微分値と、第１加算部４４が算出したＮ個の更新された第２変数とを加算することにより、Ｎ個の第２更新値を算出する。 The first FX calculation unit 51-1 performs the first function calculation (FX(x _i )) on each of the N first variables at the first time to generate N second differential values Calculate The first FX addition unit 52-1 combines the N second differential values calculated by the first FX calculation unit 51-1 and the N updated second variables calculated by the first addition unit 44. N second update values are calculated by adding .

第１のＦＹ演算部５３－１は、第１のＦＸ加算部５２－１が算出したＮ個の第２更新値のそれぞれに対して第２関数演算（ＦＹ（ｙ_ｉ））をすることにより、Ｎ個の第１微分値を算出する。第１のＦＹ加算部５４－１は、第１のＦＹ演算部５３－１が算出したＮ個の第１微分値と第１時刻におけるＮ個の第１変数とを加算することにより、Ｎ個の第１更新値を算出する。 The first FY calculation unit 53-1 performs a second function calculation (FY(y _i )) on each of the N second update values calculated by the first FX addition unit 52-1. , N first differential values are calculated. The first FY addition unit 54-1 adds the N first differential values calculated by the first FY calculation unit 53-1 and the N first variables at the first time, thereby obtaining N Calculate the first updated value of .

そして、関数演算部４６は、第１のＦＹ加算部５４－１が算出したＮ個の第１更新値を第１変数メモリ４０に与える。第１変数メモリ４０は、第１のＦＹ加算部５４－１が算出したＮ個の第１更新値を、第２時刻におけるＮ個の第１変数として記憶する。 Then, the function calculator 46 provides the first variable memory 40 with the N first update values calculated by the first FY adder 54-1. The first variable memory 40 stores the N first update values calculated by the first FY adding section 54-1 as the N first variables at the second time.

また、関数演算部４６は、第１のＦＸ加算部５２－１が算出したＮ個の第２更新値を第２変数メモリ４２に与える。第２変数メモリ４２は、第１のＦＸ加算部５２－１が算出したＮ個の第２更新値を、第２時刻におけるＮ個の第２変数として記憶する。 Also, the function calculation unit 46 provides the second variable memory 42 with the N second update values calculated by the first FX addition unit 52-1. The second variable memory 42 stores the N second update values calculated by the first FX adding section 52-1 as N second variables at the second time.

ここで、第１関数演算は、下記の式（３１）の演算である。
ＦＸ（ｘ_ｉ）＝ｄｔ´×［（－Ｄ＋ｐ－Ｋｘ_ｉ ^２）ｘ_ｉ－ｃ×ｈ_ｉ×ａ］…（３１） Here, the first function operation is the operation of Equation (31) below.
FX(x _i )=dt′×[(−D+p−Kx _i ² )x _i −c×h _i ×a] (31)

また、第２関数は、下記の式（３２）の演算である。
ＦＹ（ｙ_ｉ）＝ｄｔ´×Ｄ×ｙ_ｉ…（３２） Also, the second function is the calculation of the following equation (32).
FY(y _i )=dt′×D×y _i (32)

また、第１関数演算は、下記の式（３３）の演算であってもよい。
ＦＸ（ｘ_ｉ）＝ｄｔ´×｛［（－Ｄ＋ｐ）（１＋ｘ_ｉ ^ｎ）－Ｋｘ_ｉ ^ｎ＋２］ｘ_ｉ－ｃ×ｈ_ｉ×ａ｝…（３３） Also, the first function operation may be the operation of the following equation (33).
FX(x _i )=dt′×{[(−D+p)(1+x _i ⁿ )−K _{x i} ⁿ⁺² ]x _i −c×h _i ×a} (33)

式（３１）、（３２）および（３３）において、ｘ_ｉは、第１時刻におけるＮ個の第１変数のうちのｉ番目の第１変数、または、Ｎ個の第１更新値のうちのｉ番目の第１更新値である。ｙ_ｉは、第１加算部４４が算出した更新されたＮ個の第２変数のうちのｉ番目の第２変数、または、Ｎ個の第２更新値のうちのｉ番目の第２更新値である。 In equations (31), (32) and (33), x _i is the i-th first variable among the N first variables at the first time, or the i-th first variable among the N first update values It is the i-th first update value. _yi is the i-th second variable of the updated N second variables calculated by the first adder 44 or the i-th second update value of the N second update values; is.

ｄｔ´は、予め設定された微小時間である。Ｄ、ｃ、Ｋは、予め設定された定数である。ｈ_ｉは、ｉ毎に設定された係数である。ｐおよびａは、予め定められた演算式に従ってサンプリング時刻毎に増加する値である。 dt' is a minute time set in advance. D, c, and K are preset constants. h _i is a coefficient set for each i. p and a are values that increase at each sampling time according to a predetermined arithmetic expression.

このような第１例の構成の関数演算部４６は、第１実施形態におけるＳ１６からＳ１９までのループ処理を、Ｓ１７→Ｓ１８の順で、１回実行することができる。これにより、関数演算部４６によれば、シンプレクティック・オイラー法と呼ばれる方法を用いて、第１時刻におけるＮ個の第１変数およびＮ個の第２変数を時間発展させて、第２時刻におけるＮ個の第１変数およびＮ個の第２変数を算出することができる。 The function calculation unit 46 having such a configuration of the first example can execute the loop processing from S16 to S19 in the first embodiment once in the order of S17→S18. As a result, the function computing unit 46 uses a method called the symplectic Euler method to time-evolve the N first variables and N second variables at the first time to obtain N first variables and N second variables in can be calculated.

さらに、第１例に係る関数演算部４６は、パイプライン処理により演算を実現してもよい。パイプライン処理を実行する場合、例えば、関数演算部４６は、第１変数Ｘ_ＩＮおよび第２変数Ｙ_ＩＮを含む１つの変数ペアを、クロックサイクル毎にインデックス順に受け取る。そして、関数演算部４６は、受け取った変数ペアに対して演算を実行することにより、演算処理後の第１変数Ｘ_ＯＵＴおよび第２変数Ｙ_ＯＵＴを含む１つの変数ペアを算出する。 Furthermore, the function calculation unit 46 according to the first example may realize calculation by pipeline processing. When executing pipeline processing, for example, the function operation unit 46 receives one variable pair including the first variable X _IN and the second variable Y _IN in index order for each clock cycle. Then, the function calculation unit 46 calculates one variable pair including the first variable X _OUT and the second variable Y _OUT after the calculation processing by performing calculation on the received variable pair.

具体的には、パイプライン処理を実行する場合、関数演算部４６は、２段のＸ転送レジスタ５５－１～５５－２と、２段のＹ転送レジスタ５６－１～５６－２と、１つのＸ出力レジスタ５７と、１つのＹ出力レジスタ５８とを、さらに有する。 Specifically, when executing pipeline processing, the function operation unit 46 includes two stages of X transfer registers 55-1 to 55-2, two stages of Y transfer registers 56-1 to 56-2, 1 It further has one X output register 57 and one Y output register 58 .

１段目のＸ転送レジスタ５５－１は、クロックサイクル毎に、第１変数メモリ４０から、第１時刻におけるＮ個の第１変数の中から１つの第１変数Ｘ_ＩＮを取得し、取得した１つの第１変数Ｘ_ＩＮを、１クロックサイクル期間保持する。１段目のＹ転送レジスタ５６－１は、クロックサイクル毎に、第１加算部４４が算出した更新されたＮ個の第２変数の中から、１つの第２変数Ｙ_ＩＮを取得し、取得した１つの第２変数Ｙ_ＩＮを、１クロックサイクル期間保持する。なお、１段目のＸ転送レジスタ５５－１および１段目のＹ転送レジスタ５６－１は、同一のクロックサイクルにおいて、同一のインデックスｉの１つの第１変数および１つの第２変数を取得する。 The first-stage X transfer register 55-1 acquires one first variable X _IN out of N first variables at the first time from the first variable memory 40 for each clock cycle, and acquires One first variable X _IN is held for one clock cycle period. The Y transfer register 56-1 of the first stage acquires one second variable Y _IN out of the N updated second variables calculated by the first adder 44 for each clock cycle. One second variable Y _IN is held for one clock cycle period. The X transfer register 55-1 at the first stage and the Y transfer register 56-1 at the first stage acquire one first variable and one second variable of the same index i in the same clock cycle. .

第１のＦＸ演算部５１－１は、クロックサイクル毎に、１段目のＸ転送レジスタ５５－１に格納された第１変数Ｘ_ＩＮに対して、第１関数演算を実行することにより、第２微分値を算出する。第１のＦＸ加算部５２－１は、クロックサイクル毎に、第１のＦＸ演算部５１－１が当該クロックサイクルにおいて算出した第２微分値と、１段目のＹ転送レジスタ５６－１に格納された第２変数Ｙ_ＩＮとを加算することにより、第２更新値Ｙ_１を算出する。 The first FX calculation unit 51-1 executes the first function calculation on the first variable _XIN stored in the first-stage X transfer register 55-1 for each clock cycle, thereby obtaining the first Calculate the second differential value. For each clock cycle, the first FX addition unit 52-1 stores the second differential value calculated in the clock cycle by the first FX calculation unit 51-1 in the first-stage Y transfer register 56-1. The second update value _{Y_1} is calculated by adding the second variable _{Y_IN} .

２段目のＸ転送レジスタ５５－２は、クロックサイクル毎に、直前のクロックサイクルにおいて１段目のＸ転送レジスタ５５－１が保持していた第１変数Ｘ_ＩＮを取得し、取得した第１変数Ｘ_ＩＮを、１クロックサイクル期間保持する。２段目のＹ転送レジスタ５６－２は、クロックサイクル毎に、第１のＦＸ加算部５２－１が直前のクロックサイクルにおいて算出した第２更新値Ｙ_１を取得し、取得した第２更新値Ｙ_１を１クロックサイクル期間保持する。 The second-stage X transfer register 55-2 acquires the first variable X IN held by the first-stage X transfer register 55-1 in the immediately preceding clock cycle, and stores the acquired first variable X _IN at each clock cycle. The variable X _IN is held for one clock cycle period. The Y transfer register 56-2 in the second stage acquires the second update value _Y1 calculated in the immediately preceding clock cycle by the first FX addition unit 52-1 for each clock cycle, and stores the acquired second update value Hold Y ₁ for one clock cycle.

第１のＦＹ演算部５３－１は、クロックサイクル毎に、２段目のＹ転送レジスタ５６－２に格納された第２更新値Ｙ_１に対して、第２関数演算を実行することにより、第１微分値を算出する。第１のＦＹ加算部５４－１は、クロックサイクル毎に、第１のＦＹ演算部５３－１が当該クロックサイクルにおいて算出した第１微分値と、２段目のＸ転送レジスタ５５－２に格納された第１変数Ｘ_ＩＮとを加算することにより、第１更新値Ｘ_１を算出する。 The first FY calculation unit 53-1 executes the second function calculation on the second update value _Y1 stored in the Y transfer register 56-2 of the second stage for each clock cycle. A first differential value is calculated. For each clock cycle, the first FY addition unit 54-1 stores the first differential value calculated in the clock cycle by the first FY calculation unit 53-1 in the second-stage X transfer register 55-2. The first update value _{X_1} is calculated by adding the first variable _{X_IN} obtained.

Ｘ出力レジスタ５７は、クロックサイクル毎に、第１のＦＹ加算部５４－１が直前のクロックサイクルにおいて算出した第１更新値Ｘ_１を取得し、取得した第１更新値Ｘ_１を第２時刻における１つの第１変数Ｘ_ＯＵＴとして第１変数メモリ４０に記憶させる。 For each clock cycle, the X output register 57 acquires the first updated value _X1 calculated by the first FY adder 54-1 in the immediately preceding clock cycle, and stores the acquired first updated value _X1 at the second time. is stored in the first variable memory 40 as one first variable X _{OUT in} .

Ｙ出力レジスタ５８は、クロックサイクル毎に、第２段目のＹ転送レジスタ５６－２に格納された第２更新値Ｙ_１を取得し、取得した第２更新値Ｙ_１を第２時刻における１つの第２変数Ｙ_ＯＵＴとして第２変数メモリ４２に記憶させる。 The Y output register 58 acquires the second update value Y ₁ stored in the second stage Y transfer register 56-2 at each clock cycle, and converts the acquired second update value Y ₁ to 1 at the second time. are stored in the second variable memory 42 as two second variables Y _OUT .

このようなパイプライン処理を実行することにより、第１例に係る関数演算部４６は、クロックサイクル毎に、下記のような演算を実行することができる。 By executing such pipeline processing, the function calculation unit 46 according to the first example can execute the following calculation for each clock cycle.

Ｙ_１＝ＦＸ（Ｘ_ＩＮ）＋Ｙ_ＩＮ
Ｘ_１＝ＦＹ（Ｙ_１）＋Ｘ_ＩＮ
Ｙ_ＯＵＴ＝Ｙ_１
Ｘ_ＯＵＴ＝Ｘ_１ _Y1 =FX( _XIN )+ _YIN
_X1 =FY( _Y1 )+ _XIN
Y _OUT = Y ₁
X _OUT =X ₁

このような第１例に係る関数演算部４６は、同一のインデックスｉの１つの第１変数Ｘ_ＩＮおよび１つの第２変数Ｙ_ＩＮを含む変数ペアに対して、第１関数演算→ＦＸ加算→第２関数演算→ＦＹ加算の一連の演算セットを、１回実行することができる。さらに、第１例に係る関数演算部４６は、パイプライン処理を実行するので、複数の変数ペアに対して並列して演算を実行することができる。これにより、第１例に係る関数演算部４６は、Ｎ個の変数ペアに対する演算を、短時間で完了させることができる。 The function operation unit 46 according to such a first example performs first function operation → FX addition → for variable pairs including one first variable X _IN and one second variable Y _IN with the same index i. A series of operation sets of second function operation→FY addition can be executed once. Furthermore, since the function calculation unit 46 according to the first example executes pipeline processing, it is possible to execute calculations in parallel on a plurality of variable pairs. As a result, the function calculation unit 46 according to the first example can complete calculations for N variable pairs in a short period of time.

図６は、関数演算部４６の構成の第２例を示す図である。関数演算部４６は、図６に示すような、第２例の構成であってもよい。第２例に係る関数演算部４６は、第１例と同一の構成要素を有するが、各構成要素の配置が異なる。第２例を説明するに当たり、第１例と同一の動作をする構成要素については、同一の符号を付けて相違点を除き詳細な説明を省略する。 FIG. 6 is a diagram showing a second example of the configuration of the function calculator 46. As shown in FIG. The function calculator 46 may have the configuration of the second example as shown in FIG. The function calculator 46 according to the second example has the same components as those of the first example, but the arrangement of each component is different. In describing the second example, components that operate in the same manner as in the first example are denoted by the same reference numerals, and detailed description thereof will be omitted except for differences.

第１のＦＹ演算部５３－１は、第１加算部４４が算出した更新されたＮ個の第２変数のそれぞれに対して第２関数演算（ＦＹ（ｙ_ｉ））をすることにより、Ｎ個の第１微分値を算出する。第１のＦＹ加算部５４－１は、第１のＦＹ演算部５３－１が算出したＮ個の第１微分値と第１時刻におけるＮ個の第１変数とを加算することにより、Ｎ個の第１更新値を算出する。 The first FY calculation unit 53-1 performs a second function calculation (FY(y _i )) on each of the N updated second variables calculated by the first addition unit 44, thereby obtaining N Calculate the first differential values. The first FY addition unit 54-1 adds the N first differential values calculated by the first FY calculation unit 53-1 and the N first variables at the first time, thereby obtaining N Calculate the first updated value of .

第１のＦＸ演算部５１－１は、第１のＦＹ加算部５４－１が算出したＮ個の第１更新値のそれぞれに対して、第１関数演算（ＦＸ（ｘ_ｉ））をすることにより、Ｎ個の第２微分値を算出する。第１のＦＸ加算部５２－１は、第１のＦＸ演算部５１－１が算出したＮ個の第２微分値と、第１加算部４４が算出した更新されたＮ個の第２変数とを加算することにより、Ｎ個の第２更新値を算出する。 The first FX calculation unit 51-1 performs a first function calculation (FX(x _i )) on each of the N first update values calculated by the first FY addition unit 54-1. to calculate N second differential values. The first FX adder 52-1 combines the N second differential values calculated by the first FX calculator 51-1 and the updated N second variables calculated by the first adder 44. N second update values are calculated by adding .

このような第２例の構成の関数演算部４６は、第１実施形態におけるＳ１６からＳ１９までのループ処理を、Ｓ１８→Ｓ１７の順で、１回実行することができる。これにより、関数演算部４６によれば、シンプレクティック・オイラー法と呼ばれる方法を用いて、第１時刻におけるＮ個の第１変数およびＮ個の第２変数を時間発展させて、第２時刻におけるＮ個の第１変数およびＮ個の第２変数を算出することができる。 The function calculation unit 46 having such a configuration of the second example can execute the loop processing from S16 to S19 in the first embodiment once in the order of S18→S17. As a result, the function computing unit 46 uses a method called the symplectic Euler method to time-evolve the N first variables and N second variables at the first time to obtain N first variables and N second variables in can be calculated.

さらに、第２例に係る関数演算部４６は、パイプライン処理により演算を実現してもよい。この場合も、第２例に係る関数演算部４６は、第１例と同一の構成要素を有するが、各構成要素の配置が異なる。 Furthermore, the function calculation unit 46 according to the second example may realize calculation by pipeline processing. In this case as well, the function calculator 46 according to the second example has the same components as those of the first example, but the arrangement of the components is different.

第１のＦＹ演算部５３－１は、クロックサイクル毎に、１段目のＹ転送レジスタ５６－１に格納された、更新された第２変数Ｙ_ＩＮに対して、第２関数演算を実行することにより、第１微分値を算出する。第１のＦＹ加算部５４－１は、クロックサイクル毎に、第１のＦＹ演算部５３－１が当該クロックサイクルにおいて算出した第１微分値と、１段目のＸ転送レジスタ５５－１に格納された第１変数Ｘ_ＩＮとを加算することにより、第１更新値Ｘ_１を算出する。 The first FY calculation unit 53-1 executes a second function calculation on the updated second variable Y _IN stored in the Y transfer register 56-1 of the first stage at each clock cycle. Thus, the first differential value is calculated. For each clock cycle, the first FY addition unit 54-1 stores the first differential value calculated in the clock cycle by the first FY calculation unit 53-1 in the X transfer register 55-1 of the first stage. The first update value _{X_1} is calculated by adding the first variable _{X_IN} obtained.

２段目のＸ転送レジスタ５５－２は、クロックサイクル毎に、第１のＦＹ加算部５４－１が直前のクロックサイクルにおいて算出した第１更新値Ｘ_１を取得し、取得した第１更新値Ｘ_１を１クロックサイクル期間保持する。２段目のＹ転送レジスタ５６－２は、クロックサイクル毎に、直前のクロックサイクルにおいて１段目のＹ転送レジスタ５６－１が保持していた第２変数Ｙ_ＩＮを取得し、取得した第２変数Ｙ_ＩＮを、１クロックサイクル期間保持する。 The second-stage X transfer register 55-2 acquires the first update value _X1 calculated in the immediately preceding clock cycle by the first FY addition unit 54-1 for each clock cycle, and stores the acquired first update value Hold _X1 for one clock cycle. The second-stage Y transfer register 56-2 acquires the second _{variable Y_IN} held by the first-stage Y transfer register 56-1 in the immediately preceding clock cycle, and stores the acquired second variable Y_IN in each clock cycle. The variable _{Y_IN} is held for one clock cycle period.

第１のＦＸ演算部５１－１は、クロックサイクル毎に、２段目のＸ転送レジスタ５５－２に格納された第１更新値Ｘ_１に対して、第１関数演算を実行することにより、第２微分値を算出する。第１のＦＸ加算部５２－１は、クロックサイクル毎に、第１のＦＸ演算部５１－１が当該クロックサイクルにおいて算出した第１微分値と、２段目のＹ転送レジスタ５６－２に格納された第２変数Ｙ_ＩＮとを加算することにより、第２更新値Ｙ_１を算出する。 The first FX calculation unit 51-1 executes the first function calculation on the first update value _X1 stored in the second-stage X transfer register 55-2 at each clock cycle. A second differential value is calculated. For each clock cycle, the first FX addition unit 52-1 stores the first differential value calculated in the clock cycle by the first FX calculation unit 51-1 in the second-stage Y transfer register 56-2. The second update value _{Y_1} is calculated by adding the second variable _{Y_IN} .

このようなパイプライン処理を実行することにより、第２例に係る関数演算部４６は、クロックサイクル毎に、下記のような演算を実行することができる。 By executing such pipeline processing, the function calculation unit 46 according to the second example can execute the following calculation for each clock cycle.

Ｘ_１＝ＦＹ（Ｙ_ＩＮ）＋Ｘ_ＩＮ
Ｙ_１＝ＦＸ（Ｘ_１）＋Ｙ_ＩＮ
Ｘ_ＯＵＴ＝Ｘ_１
Ｙ_ＯＵＴ＝Ｙ_１ _X1 =FY( _YIN )+ _XIN
_Y1 =FX( _X1 )+ _YIN
X _OUT =X ₁
Y _OUT = Y ₁

このような第２例に係る関数演算部４６は、同一のインデックスｉの１つの第１変数Ｘ_ＩＮおよび１つの第２変数Ｙ_ＩＮを含む変数ペアに対して、第２関数演算→ＦＹ加算→第１関数演算→ＦＸ加算の一連の演算セットを、１回実行することができる。さらに、第２例に係る関数演算部４６は、パイプライン処理を実行するので、複数の変数ペアに対して並列して演算を実行することができる。これにより、第２例に係る関数演算部４６は、Ｎ個の変数ペアに対する演算を、短時間で完了させることができる。 The function operation unit 46 according to such a second example performs second function operation→FY addition→for variable pairs including one first variable X _IN and one second variable Y _IN of the same index i. A series of operation sets of first function operation→FX addition can be executed once. Furthermore, since the function calculation unit 46 according to the second example executes pipeline processing, it is possible to execute calculations in parallel on a plurality of variable pairs. As a result, the function calculation unit 46 according to the second example can complete calculations for N variable pairs in a short period of time.

図７は、関数演算部４６の構成の第３例を示す図である。関数演算部４６は、図７に示すような、第３例の構成であってもよい。第３例に係る関数演算部４６は、第１例と略同一の構成を有する。第３例を説明するに当たり、第１例と同一の動作をする構成要素については、同一の符号を付けて相違点を除き詳細な説明を省略する。 FIG. 7 is a diagram showing a third example of the configuration of the function calculator 46. As shown in FIG. The function calculator 46 may have a configuration of a third example as shown in FIG. The function calculator 46 according to the third example has substantially the same configuration as that of the first example. In describing the third example, components that operate in the same manner as in the first example are denoted by the same reference numerals, and detailed description thereof will be omitted except for differences.

第３例に係る関数演算部４６は、第１例の構成に加えて、第２から第Ｍ（Ｍは２以上の整数）までの（Ｍ－１）個のＦＸ演算部５１－２～５１－Ｍと、第２から第Ｍまでの（Ｍ－１）個のＦＸ加算部５２－２～５２－Ｍと、第２から第Ｍまでの（Ｍ－１）個のＦＹ演算部５３－２～５３－Ｍと、第２から第Ｍまでの（Ｍ－１）個のＦＹ加算部５４－２～５４－Ｍと、をさらに有する。 In addition to the configuration of the first example, the function calculation unit 46 according to the third example includes (M−1) FX calculation units 51-2 to 51 from second to Mth (M is an integer of 2 or more) -M, second to M-th (M-1) FX addition units 52-2 to 52-M, and second to M-th (M-1) FY operation units 53-2 53-M, and (M-1) second to M-th FY adders 54-2 to 54-M.

第ｍのＦＸ演算部５１－ｍ（ｍは、２からＭまでの任意の整数）は、第（ｍ－１）のＦＹ加算部５４－（ｍ－１）が算出したＮ個の第１更新値のそれぞれに対して第１関数演算をすることにより、Ｎ個の第２微分値を算出する。第ｍのＦＸ加算部５２－ｍは、第ｍのＦＸ演算部５１－ｍが算出したＮ個の第２微分値と第（ｍ－１）のＦＸ加算部５２－（ｍ－１）が算出したＮ個の第２更新値とを加算することにより、新たなＮ個の第２更新値を算出する。 The m-th FX calculation unit 51-m (m is an arbitrary integer from 2 to M) calculates N first updates calculated by the (m-1)-th FY addition unit 54-(m-1) N second differential values are calculated by performing a first functional operation on each of the values. The m-th FX addition unit 52-m calculates the N second differential values calculated by the m-th FX calculation unit 51-m and the (m−1)th FX addition unit 52-(m−1) New N second update values are calculated by adding the N second update values.

第ｍのＦＹ演算部５３－ｍは、第ｍのＦＸ加算部５２－ｍが算出したＮ個の第２更新値のそれぞれに対して第２関数演算をすることにより、Ｎ個の第１微分値を算出する。第ｍのＦＹ加算部５４－ｍは、第ｍのＦＹ演算部５３－ｍが算出したＮ個の第１微分値と第（ｍ－１）のＦＹ加算部５４－（ｍ－１）が算出したＮ個の第１更新値とを加算することにより、新たなＮ個の第１更新値を算出する。 The m-th FY calculation unit 53-m performs the second function calculation on each of the N second update values calculated by the m-th FX addition unit 52-m to obtain N first differential Calculate the value. The m-th FY addition unit 54-m calculates the N first differential values calculated by the m-th FY calculation unit 53-m and the (m−1)th FY addition unit 54-(m−1). New N first update values are calculated by adding the calculated N first update values.

そして、関数演算部４６は、第ＭのＦＹ加算部５４－Ｍが算出したＮ個の第１更新値を第１変数メモリ４０に与える。第１変数メモリ４０は、第ＭのＦＹ加算部５４－Ｍが算出したＮ個の第１更新値を、第２時刻におけるＮ個の第１変数として記憶する。 Then, the function calculator 46 provides the first variable memory 40 with the N first update values calculated by the Mth FY adder 54-M. The first variable memory 40 stores the N first update values calculated by the M-th FY adding section 54-M as the N first variables at the second time.

また、関数演算部４６は、第ＭのＦＸ加算部５２－Ｍが算出したＮ個の第２更新値を第２変数メモリ４２に与える。第２変数メモリ４２は、第ＭのＦＸ加算部５２－Ｍが算出したＮ個の第２更新値を、第２時刻におけるＮ個の第２変数として記憶する。 Also, the function calculation unit 46 provides the second variable memory 42 with the N second update values calculated by the Mth FX addition unit 52-M. The second variable memory 42 stores the N second update values calculated by the M-th FX addition unit 52-M as N second variables at the second time.

このような第３例の構成の関数演算部４６は、第１実施形態におけるＳ１６からＳ１９までのループ処理を、Ｓ１７→Ｓ１８の順で、Ｍ回実行することができる。これにより、関数演算部４６によれば、シンプレクティック・オイラー法と呼ばれる方法を用いて、第１時刻におけるＮ個の第１変数およびＮ個の第２変数を時間発展させて、第２時刻におけるＮ個の第１変数およびＮ個の第２変数を算出することができる。 The function calculation unit 46 having such a configuration of the third example can execute the loop processing from S16 to S19 in the first embodiment M times in the order of S17→S18. As a result, the function computing unit 46 uses a method called the symplectic Euler method to time-evolve the N first variables and N second variables at the first time to obtain N first variables and N second variables in can be calculated.

さらに、第３例に係る関数演算部４６は、パイプライン処理により演算を実現してもよい。この場合、関数演算部４６は、２Ｍ段のＸ転送レジスタ５５－１～５５－２Ｍと、２Ｍ段のＹ転送レジスタ５６－１～５６－２Ｍと、１つのＸ出力レジスタ５７と、１つのＹ出力レジスタ５８とを、さらに有する。なお、これらの構成のうちの第１例で示した構成は、Ｘ出力レジスタ５７およびＹ出力レジスタ５８を除いて第１例と同様の動作をする。 Furthermore, the function calculation unit 46 according to the third example may realize calculation by pipeline processing. In this case, the function operation unit 46 includes 2M stages of X transfer registers 55-1 to 55-2M, 2M stages of Y transfer registers 56-1 to 56-2M, one X output register 57, and one Y and an output register 58 . Of these configurations, the configuration shown in the first example operates in the same manner as the first example except for the X output register 57 and the Y output register 58. FIG.

（２ｍ－１）段目のＸ転送レジスタ５５－（２ｍ－１）は、クロックサイクル毎に、第（ｍ－１）のＦＹ加算部５４－（ｍ－１）が直前のクロックサイクルにおいて算出した第１更新値Ｘ_ｍ－１を取得し、取得した第１更新値Ｘ_ｍ－１を１クロックサイクル期間保持する。（２ｍ－１）段目のＹ転送レジスタ５６－（２ｍ－１）は、クロックサイクル毎に、直前のクロックサイクルにおいて（２ｍ－２）段目のＹ転送レジスタ５６－（２ｍ－２）が保持していた第２更新値Ｙ_ｍ－１を取得し、取得した１つの第２更新値Ｙ_ｍ－１を、１クロックサイクル期間保持する。 The (2m−1)th stage X transfer register 55-(2m−1) is calculated in the immediately preceding clock cycle by the (m−1)th FY addition unit 54-(m−1) every clock cycle. A first update value X _m−1 is obtained, and the obtained first update value X _m−1 is held for one clock cycle. The (2m-1)th stage Y transfer register 56-(2m-1) is held by the (2m-2)th stage Y transfer register 56-(2m-2) in the immediately preceding clock cycle. The obtained second update value Y _m−1 _is held for one clock cycle period.

第ｍのＦＸ演算部５１－ｍは、クロックサイクル毎に、（２ｍ－１）段目のＸ転送レジスタ５５－（２ｍ－１）に格納された第１更新値Ｘ_ｍ－１に対して、第１関数演算を実行することにより、第２微分値を算出する。第ｍのＦＸ加算部５２－ｍは、クロックサイクル毎に、第ｍのＦＸ演算部５１－ｍが当該クロックサイクルにおいて算出した第２微分値と、（２ｍ－１）段目のＹ転送レジスタ５６－（２ｍ－１）に格納された第２更新値Ｙ_ｍ－１とを加算することにより、新たな第２更新値Ｙ_ｍを算出する。 For each clock cycle, the m-th FX calculation unit 51 _- m performs A second differential value is calculated by executing the first function operation. For each clock cycle, the m-th FX addition unit 52-m combines the second differential value calculated in the clock cycle by the m-th FX calculation unit 51-m with the (2m-1)th Y transfer register 56 A new second update value _Ym is calculated by adding the second update value _Ym-1 stored in -(2m-1).

２ｍ段目のＸ転送レジスタ５５－２ｍは、クロックサイクル毎に、直前のクロックサイクルにおいて（２ｍ－１）段目のＸ転送レジスタ５５－（２ｍ－１）が保持していた第１更新値Ｘ_ｍ－１を取得し、取得した１つの第１更新値Ｘ_ｍ－１を、１クロックサイクル期間保持する。２ｍ段目のＹ転送レジスタ５６－２ｍは、クロックサイクル毎に、第ｍのＦＸ加算部５２－ｍが直前のクロックサイクルにおいて算出した第２更新値Ｙ_ｍを取得し、取得した第２更新値Ｙ_ｍを１クロックサイクル期間保持する。 The 2m-th stage X transfer register 55-2m updates the first update value X held by the (2m−1)th stage X transfer register 55-(2m−1) in the immediately preceding clock cycle. _m−1 , and holds one acquired first update value X _m−1 for one clock cycle period. The Y transfer register 56-2m of the 2mth stage acquires the second update value _Ym calculated in the immediately preceding clock cycle by the mth FX addition unit 52-m for each clock cycle, and stores the acquired second update value Hold _Ym for one clock cycle.

第ｍのＦＹ演算部５３－ｍは、クロックサイクル毎に、２ｍ段目のＹ転送レジスタ５６－２ｍに格納された第２更新値Ｙ_ｍに対して、第２関数演算を実行することにより、第１微分値を算出する。第ｍのＦＹ加算部５４－ｍは、クロックサイクル毎に、第ｍのＦＹ演算部５３－ｍが当該クロックサイクルにおいて算出した第１微分値と、２ｍ段目のＸ転送レジスタ５５－２ｍに格納された第１更新値Ｘ_ｍ－１とを加算することにより、新たな第１更新値Ｘ_ｍを算出する。 The m-th FY calculation unit 53-m executes the second function calculation on the second update value _Ym stored in the 2m-th stage Y transfer register 56-2m for each clock cycle, A first differential value is calculated. The m-th FY addition unit 54-m stores the first differential value calculated in the clock cycle by the m-th FY calculation unit 53-m in the 2m-th stage X transfer register 55-2m for each clock cycle. A new first update value X m is calculated by adding the calculated first update value X _m ₋₁ .

Ｘ出力レジスタ５７は、クロックサイクル毎に、第ＭのＦＹ加算部５４－Ｍが直前のクロックサイクルにおいて算出した第１更新値Ｘ_Ｍを取得し、取得した第１更新値Ｘ_Ｍを第２時刻における１つの第１変数Ｘ_ＯＵＴとして第１変数メモリ４０に記憶させる。 For each clock cycle, the X output register 57 acquires the first update value X _M calculated by the M-th FY addition unit 54-M in the immediately preceding clock cycle, and stores the acquired first update value X _M at the second time. is stored in the first variable memory 40 as one first variable X _{OUT in} .

Ｙ出力レジスタ５８は、クロックサイクル毎に、第２Ｍ段目のＹ転送レジスタ５６－２Ｍに格納された第２更新値Ｙ_Ｍを取得し、取得した第２更新値Ｙ_Ｍを第２時刻における１つの第２変数Ｙ_ＯＵＴとして第２変数メモリ４２に記憶させる。 The Y output register 58 acquires the second update value _YM stored in the Y transfer register 56-2M at the 2Mth stage every clock cycle, and converts the acquired second update value _YM to 1 at the second time. are stored in the second variable memory 42 as two second variables Y _OUT .

このようなパイプライン処理を実行することにより、第３例に係る関数演算部４６は、クロックサイクル毎に、下記のような演算を実行することができる。 By executing such pipeline processing, the function calculation unit 46 according to the third example can execute the following calculations in each clock cycle.

Ｙ_１＝ＦＸ（Ｘ_ＩＮ）＋Ｙ_ＩＮ
Ｘ_１＝ＦＹ（Ｙ_１）＋Ｘ_ＩＮ
…
Ｙ_ｍ＝ＦＸ（Ｘ_ｍ－１）＋Ｙ_ｍ－１
Ｘ_ｍ＝ＦＹ（Ｙ_ｍ）＋Ｘ_ｍ－１
…
Ｙ_Ｍ＝ＦＸ（Ｘ_Ｍ－１）＋Ｙ_Ｍ－１
Ｘ_Ｍ＝ＦＹ（Ｙ_Ｍ）＋Ｘ_Ｍ－１
Ｙ_ＯＵＴ＝Ｙ_Ｍ
Ｘ_ＯＵＴ＝Ｘ_Ｍ _Y1 =FX( _XIN )+ _YIN
_X1 =FY( _Y1 )+ _XIN
…
Y _m =FX(X _m−1 )+Y _m−1
_Xm = FY( _Ym ) + Xm _-1
…
Y _M =FX(X _M−1 )+Y _M−1
_XM = FY( _YM ) + XM _-1
_YOUT = _YM
X _OUT =X _M

このような第３例に係る関数演算部４６は、同一のインデックスｉの１つの第１変数Ｘ_ＩＮおよび１つの第２変数Ｙ_ＩＮを含む変数ペアに対して、第１関数演算→ＦＸ加算→第２関数演算→ＦＹ加算の一連の演算セットを、Ｍ回実行することができる。さらに、第３例に係る関数演算部４６は、パイプライン処理により実行するので、複数の変数ペアに対して並列して演算を実行することができる。これにより、第３例に係る関数演算部４６は、Ｎ個の変数ペアに対する演算を、短時間で完了させることができる。 The function operation unit 46 according to such a third example performs first function operation→FX addition→for variable pairs including one first variable X _IN and one second variable Y _IN of the same index i. A series of operation sets of second function operation→FY addition can be executed M times. Furthermore, since the function calculation unit 46 according to the third example executes by pipeline processing, it is possible to execute calculations in parallel on a plurality of variable pairs. As a result, the function calculation unit 46 according to the third example can complete calculations for N variable pairs in a short period of time.

図８は、関数演算部４６の構成の第４例を示す図である。関数演算部４６は、図８に示すような、第４例の構成であってもよい。第４例に係る関数演算部４６は、第２例と略同一の構成を有する。第４例を説明するに当たり、第２例と同一の動作をする構成要素については、同一の符号を付けて相違点を除き詳細な説明を省略する。 FIG. 8 is a diagram showing a fourth example of the configuration of the function calculator 46. As shown in FIG. The function calculator 46 may have the configuration of the fourth example as shown in FIG. The function calculator 46 according to the fourth example has substantially the same configuration as that of the second example. In describing the fourth example, components that operate in the same manner as in the second example are denoted by the same reference numerals, and detailed description thereof will be omitted except for differences.

第ｍのＦＹ演算部５３－ｍ（ｍは、２からＭまでの任意の整数）は、第（ｍ－１）のＦＸ加算部５２－（ｍ－１）が算出したＮ個の第２更新値のそれぞれに対して第２関数演算をすることにより、Ｎ個の第１微分値を算出する。第ｍのＦＹ加算部５４－ｍは、第ｍのＦＹ演算部５３－ｍが算出したＮ個の第１微分値と第（ｍ－１）のＦＹ加算部５４－ｍが算出したＮ個の第１更新値とを加算することにより、新たなＮ個の第１更新値を算出する。 The m-th FY calculation unit 53-m (m is an arbitrary integer from 2 to M) calculates N second updates calculated by the (m-1)-th FX addition unit 52-(m-1) N first differential values are calculated by performing a second function operation on each of the values. The m-th FY addition unit 54-m combines the N first differential values calculated by the m-th FY calculation unit 53-m with the N values calculated by the (m−1)th FY addition unit 54-m. N new first update values are calculated by adding the first update values.

第ｍのＦＸ演算部５１－ｍは、第ｍのＦＹ加算部５４－ｍが算出したＮ個の第１更新値のそれぞれに対して第１関数演算をすることにより、Ｎ個の第２微分値を算出する。第ｍのＦＸ加算部５２－ｍは、第ｍのＦＸ演算部５１－ｍが算出したＮ個の第２微分値と第（ｍ－１）のＦＸ加算部５２－（ｍ－１）が算出したＮ個の第２更新値とを加算することにより、新たなＮ個の第２更新値を算出する。 The m-th FX calculation unit 51-m performs the first function calculation on each of the N first update values calculated by the m-th FY addition unit 54-m, thereby obtaining N second differential Calculate the value. The m-th FX addition unit 52-m calculates the N second differential values calculated by the m-th FX calculation unit 51-m and the (m−1)th FX addition unit 52-(m−1) New N second update values are calculated by adding the N second update values.

このような第４例の構成の関数演算部４６は、第１実施形態におけるＳ１６からＳ１９までのループ処理を、Ｓ１８→Ｓ１７の順で、Ｍ回実行することができる。これにより、関数演算部４６によれば、シンプレクティック・オイラー法と呼ばれる方法を用いて、第１時刻におけるＮ個の第１変数およびＮ個の第２変数を時間発展させて、第２時刻におけるＮ個の第１変数およびＮ個の第２変数を算出することができる。 The function calculation unit 46 having such a configuration of the fourth example can execute the loop processing from S16 to S19 in the first embodiment M times in the order of S18→S17. As a result, the function computing unit 46 uses a method called the symplectic Euler method to time-evolve the N first variables and N second variables at the first time to obtain N first variables and N second variables in can be calculated.

さらに、第４例に係る関数演算部４６は、パイプライン処理により演算を実現してもよい。この場合も、関数演算部４６は、２Ｍ段のＸ転送レジスタ５５－１～５５－２Ｍと、２Ｍ段のＹ転送レジスタ５６－１～５６－２Ｍと、１つのＸ出力レジスタ５７と、１つのＹ出力レジスタ５８とを、さらに有する。なお、これらの構成のうちの第２例で示した構成は、Ｘ出力レジスタ５７およびＹ出力レジスタ５８を除いて第２例と同様の動作をする。 Furthermore, the function computing unit 46 according to the fourth example may implement computation by pipeline processing. Also in this case, the function operation unit 46 includes 2M stages of X transfer registers 55-1 to 55-2M, 2M stages of Y transfer registers 56-1 to 56-2M, one X output register 57, and one and a Y output register 58 . Of these configurations, the configuration shown in the second example operates in the same manner as the second example except for the X output register 57 and the Y output register 58. FIG.

（２ｍ－１）段目のＸ転送レジスタ５５－（２ｍ－１）は、クロックサイクル毎に、直前のクロックサイクルにおいて（２ｍ－２）段目のＸ転送レジスタ５５－（２ｍ－２）が保持していた第１更新値Ｘ_ｍ－１を取得し、取得した１つの第１更新値Ｘ_ｍ－１を、１クロックサイクル期間保持する。（２ｍ－１）段目のＹ転送レジスタ５６－（２ｍ－１）は、クロックサイクル毎に、第（ｍ－１）のＦＸ加算部５２－（ｍ－１）が直前のクロックサイクルにおいて算出した第２更新値Ｙ_ｍ－１を取得し、取得した第２更新値Ｙ_ｍ－１を１クロックサイクル期間保持する。 The (2m-1)th stage X transfer register 55-(2m-1) is held by the (2m-2)th stage X transfer register 55-(2m-2) in the immediately preceding clock cycle. The first update value X _m−1 that has been stored is acquired, and one acquired first update value X _m−1 is held for one clock cycle period. The (2m-1)th stage Y transfer register 56-(2m-1) calculates the A second update value Y _m−1 is acquired, and the acquired second update value Y _m−1 is held for one clock cycle.

第ｍのＦＹ演算部５３－ｍは、クロックサイクル毎に、（２ｍ－１）段目のＹ転送レジスタ５６－（２ｍ－１）に格納された第２更新値Ｙ_ｍ－１に対して、第２関数演算を実行することにより、第１微分値を算出する。第ｍのＦＹ加算部５４－ｍは、クロックサイクル毎に、第ｍのＦＹ演算部５３－ｍが当該クロックサイクルにおいて算出した第１微分値と、（２ｍ－１）段目のＸ転送レジスタ５５－（２ｍ－１）に格納された第１更新値Ｘ_ｍ－１とを加算することにより、新たな第１更新値Ｘ_ｍを算出する。 For each clock cycle, the m-th FY calculation unit 53 _- m performs A first differential value is calculated by executing the second function operation. For each clock cycle, the m-th FY addition unit 54-m adds the first differential value calculated in the clock cycle by the m-th FY calculation unit 53-m to the X transfer register 55 of the (2m−1) stage A new first update value X _m is calculated by adding the first update value X _m−1 stored in −(2m−1).

２ｍ段目のＸ転送レジスタ５５－２ｍは、クロックサイクル毎に、第ｍのＦＹ加算部５４－ｍが直前のクロックサイクルにおいて算出した第１更新値Ｘ_ｍを取得し、取得した第１更新値Ｘ_ｍを１クロックサイクル期間保持する。２ｍ段目のＹ転送レジスタ５６－２ｍは、クロックサイクル毎に、直前のクロックサイクルにおいて（２ｍ－１）段目のＹ転送レジスタ５６－（２ｍ－１）が保持していた第２更新値Ｙ_ｍ－１を取得し、取得した１つの第２更新値Ｙ_ｍ－１を、１クロックサイクル期間保持する。 The 2m-th stage X transfer register 55-2m acquires the first update value _Xm calculated in the immediately preceding clock cycle by the m-th FY addition unit 54-m, and stores the acquired first update value Xm in each clock cycle. Hold _Xm for one clock cycle. The 2m-th stage Y transfer register 56-2m updates the second update value Y held by the (2m−1)th stage Y transfer register 56-(2m−1) in the immediately preceding clock cycle. _m−1 , and holds one acquired second update value Y _m−1 for one clock cycle period.

第ｍのＦＸ演算部５１－ｍは、クロックサイクル毎に、２ｍ段目のＸ転送レジスタ５５－２ｍに格納された第１更新値Ｘ_ｍに対して、第１関数演算を実行することにより、第２微分値を算出する。第ｍのＦＸ加算部５２－ｍは、クロックサイクル毎に、第ｍのＦＸ演算部５１－ｍが当該クロックサイクルにおいて算出した第２微分値と、２ｍ段目のＹ転送レジスタ５６－２ｍに格納された第２更新値Ｙ_ｍ－１とを加算することにより、新たな第２更新値Ｙ_ｍを算出する。 The m-th FX calculation unit 51-m executes the first function calculation on the first update value _Xm stored in the 2m-th stage X transfer register 55-2m for each clock cycle, A second differential value is calculated. For each clock cycle, the m-th FX addition unit 52-m stores the second differential value calculated in the clock cycle by the m-th FX calculation unit 51-m in the Y transfer register 56-2m of the 2mth stage. A new second update value Y _m is calculated by adding the calculated second update value Y _m−1 .

Ｘ出力レジスタ５７は、クロックサイクル毎に、第２Ｍ段目のＸ転送レジスタ５５－２Ｍに格納された第１更新値Ｘ_Ｍを取得し、取得した第１更新値Ｘ_Ｍを第２時刻における１つの第１変数Ｘ_ＯＵＴとして第１変数メモリ４０に記憶させる。 The X output register 57 acquires the first update value _XM stored in the X transfer register 55-2M of the 2Mth stage every clock cycle, and converts the acquired first update value _XM to 1 at the second time. stored in the first variable memory 40 as two first variables X _OUT .

Ｙ出力レジスタ５８は、クロックサイクル毎に、第ＭのＦＸ加算部５２－Ｍが直前のクロックサイクルにおいて算出した第２更新値Ｙ_Ｍを取得し、取得した第２更新値Ｙ_Ｍを第２時刻における１つの第２変数Ｙ_ＯＵＴとして第２変数メモリ４２に記憶させる。 The Y output register 58 acquires the second update value _YM calculated in the immediately preceding clock cycle by the M-th FX addition unit 52-M for each clock cycle, and stores the acquired second update value _YM at the second time. is stored in the second variable memory 42 as one second variable Y _{OUT in} .

このようなパイプライン処理を実行することにより、第４例に係る関数演算部４６は、クロックサイクル毎に、下記のような演算を実行することができる。 By executing such pipeline processing, the function calculation unit 46 according to the fourth example can execute the following calculations in each clock cycle.

Ｘ_１＝ＦＹ（Ｙ_ＩＮ）＋Ｘ_ＩＮ
Ｙ_１＝ＦＸ（Ｘ_１）＋Ｙ_ＩＮ
…
Ｘ_ｍ＝ＦＹ（Ｙ_ｍ－１）＋Ｘ_ｍ－１
Ｙ_ｍ＝ＦＸ（Ｘ_ｍ）＋Ｙ_ｍ－１
…
Ｘ_Ｍ＝ＦＹ（Ｙ_Ｍ－１）＋Ｘ_Ｍ－１
Ｙ_Ｍ＝ＦＸ（Ｘ_Ｍ）＋Ｙ_Ｍ－１
Ｘ_ＯＵＴ＝Ｘ_Ｍ
Ｙ_ＯＵＴ＝Ｙ_Ｍ _X1 =FY( _YIN )+ _XIN
_Y1 =FX( _X1 )+ _YIN
…
X _m = FY(Y _m−1 )+X _m−1
_Ym = FX( _Xm ) + _Ym-1
…
X _M = FY(Y _M−1 )+X _M−1
Y _M =FX(X _M )+Y _M−1
X _OUT =X _M
_YOUT = _YM

このような第４例に係る関数演算部４６は、同一のインデックスｉの１つの第１変数Ｘ_ＩＮおよび１つの第２変数Ｙ_ＩＮを含む変数ペアに対して、第２関数演算→ＦＹ加算→第１関数演算→ＦＸ加算の一連の演算セットを、Ｍ回実行することができる。さらに、第４例に係る関数演算部４６は、パイプライン処理により実行するので、複数の変数ペアに対して並列して演算を実行することができる。これにより、第４例に係る関数演算部４６は、Ｎ個の変数ペアに対する演算を、短時間で完了させることができる。 The function calculation unit 46 according to such a fourth example performs second function calculation→FY addition→for variable pairs including one first variable X _IN and one second variable Y _IN of the same index i. A series of operation sets of first function operation→FX addition can be executed M times. Furthermore, since the function calculation unit 46 according to the fourth example executes by pipeline processing, it is possible to execute calculations on a plurality of variable pairs in parallel. As a result, the function calculation unit 46 according to the fourth example can complete calculations for N variable pairs in a short period of time.

（第３実施形態）
第３実施形態に係る計算装置１０について説明する。 (Third embodiment)
A computing device 10 according to the third embodiment will be described.

図９は、第３実施形態でのＮ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉの関係を示す図である。 FIG. 9 is a diagram showing the relationship among the N first intermediate variables x _i ', the coefficient matrix J, and the N second intermediate variables b _i in the third embodiment.

係数行列Ｊは、Ｐ_ｒ３個の分割行列（ＪＧ０，ＪＧ１，…ＪＧｋ，…，ＪＧＰｒ３－１）に分割される。Ｐ_ｒ３個の分割行列のそれぞれは、（Ｎ／Ｐ_ｒ３）行×Ｎ列の係数Ｊ_ｉ，ｊを含む。Ｐ_ｒ３は、Ｎの約数である。また、ｋは、０からＰ_ｒ３－１までの任意の整数である。 The coefficient matrix J is divided into _Pr3 divided matrices (JG0, JG1, . . . JGk, . . . , JGPr3-1). Each of the P _r3 partition matrices contains (N/P _r3 ) rows by N columns of coefficients J _i,j . _Pr3 is a divisor of N. Also, k is an arbitrary integer from 0 to P _r3 −1.

また、Ｎ個の第２中間変数ｂ_ｉは、Ｐ_ｒ３個のブロック（ＢＧ０，ＢＧ１，…ＢＧｋ，…，ＢＧＰｒ３－１）に分割される。Ｐ_ｒ３個のブロックのそれぞれは、（Ｎ／Ｐ_ｒ３）個の第２中間変数ｂ_ｉを含む。さらに、Ｐ_ｒ３個のブロックは、Ｐ_ｒ３個の分割行列に一対一で対応付けられる。例えば、ｋ番目のブロックＢＧｋは、ｋ番目の分割行列ＪＧｋに対応する。 Also, the N second intermediate variables b _i are divided into _Pr3 blocks (BG0, BG1, . . . BGk, . . . , BGPr3-1). Each of the P _r3 blocks contains (N/P _r3 ) second intermediate variables b _i . Furthermore, the _Pr3 blocks are mapped one-to-one to the _Pr3 partitioning matrices. For example, the kth block BGk corresponds to the kth partition matrix JGk.

第３実施形態において、行列乗算部２８は、Ｎ個の第１中間変数ｘ_ｉ´と、Ｐ_ｒ３個の分割行列のそれぞれとを別個に行列乗算することにより、Ｐ_ｒ３個のブロックを算出する。 In the third embodiment, the matrix multiplication unit 28 calculates P r ₃ blocks by matrix-multiplying the N first intermediate variables x _i ' and each of the Pr ₃ split matrices separately. .

図１０は、第３実施形態に係る行列乗算部２８の構成を、時間発展部３０とともに示す図である。 FIG. 10 is a diagram showing the configuration of the matrix multiplication section 28 according to the third embodiment together with the time evolution section 30. As shown in FIG.

行列乗算部２８は、Ｐ_ｒ３個の分割行列に一対一で対応付けられたＰ_ｒ３個の分割行列乗算部６０を有する。Ｐ_ｒ３個の分割行列乗算部６０のそれぞれは、Ｎ個の第１中間変数ｘ_ｉ´と、対応する分割行列とを行列乗算することにより、対応するブロックに含まれる（Ｎ／Ｐ_ｒ３）個の第２中間変数ｂ_ｉを算出する。 The matrix multiplication unit 28 has P _r 3 partitioned matrix multiplication units 60 that are associated one-to-one with the P _r 3 partitioned matrices. Each of the P _r3 partitioning matrix multiplication units 60 performs matrix multiplication of the N first intermediate variables x _i ′ and the corresponding partitioning matrix to obtain (N/P _r3 ) pieces included in the corresponding block. A second intermediate variable b _i of is calculated.

Ｐ_ｒ３個の分割行列乗算部６０は、同一の構成を有する。Ｐ_ｒ３個の分割行列乗算部６０は、例えば、互いに異なる半導体装置に実装された回路であってよい。 The Pr ₃ split matrix multipliers 60 have the same configuration. The Pr ₃ split matrix multiplication units 60 may be, for example, circuits implemented in different semiconductor devices.

Ｐ_ｒ３個の分割行列乗算部６０のそれぞれは、Ｎ個の第１中間変数ｘ_ｉ´をストリーム化した第１中間ストリームＸ´およびＮ個の第２中間変数ｂ_ｉをストリーム化した第２中間ストリームＢを受信する。また、Ｐ_ｒ３個の分割行列乗算部６０のそれぞれは、第１中間ストリームＸ´および第２中間ストリームＢを送信する。 Each of the _Pr3 split matrix multiplication units 60 generates a first intermediate stream X′ in which N first intermediate variables x _i ′ are streamed and a second intermediate stream X′ in which N second intermediate variables b _i are streamed. Receive stream B. Also, each of the _Pr3 split matrix multipliers 60 transmits the first intermediate stream X′ and the second intermediate stream B. FIG.

Ｐ_ｒ３個の分割行列乗算部６０は、直列に接続される。先頭の分割行列乗算部６０は、時間発展部３０から第１中間ストリームＸ´を受信する。また、先頭の分割行列乗算部６０は、第２中間ストリームＢとして、ダミーデータを受信する。なお、ダミーデータは、どのブロックから送信されてもよい。先頭以外の分割行列乗算部６０は、直前段の分割行列乗算部６０から送信された第１中間ストリームＸ´および第２中間ストリームＢを受信する。 The Pr ₃ division matrix multipliers 60 are connected in series. The division matrix multiplier 60 at the top receives the first intermediate stream X′ from the time evolution unit 30 . Also, the division matrix multiplier 60 at the head receives dummy data as the second intermediate stream B. FIG. Dummy data may be transmitted from any block. The split matrix multipliers 60 other than the leading one receive the first intermediate stream X' and the second intermediate stream B transmitted from the split matrix multiplier 60 of the previous stage.

末尾以外の分割行列乗算部６０は、直後段の分割行列乗算部６０へ第１中間ストリームＸ´および第２中間ストリームＢを送信する。末尾の分割行列乗算部６０は、第２中間ストリームＢを時間発展部３０へと送信する。末尾の分割行列乗算部６０から送信された第１中間ストリームＸ´は、廃棄される。なお、末尾の分割行列乗算部６０から送信された第１中間ストリームＸ´は、時間発展部３０に受信された後、時間発展部３０内で廃棄されてもよい。 The split matrix multipliers 60 other than the last one transmit the first intermediate stream X' and the second intermediate stream B to the split matrix multiplier 60 in the immediately following stage. The division matrix multiplier 60 at the end transmits the second intermediate stream B to the time evolution unit 30 . The first intermediate stream X' transmitted from the last split matrix multiplier 60 is discarded. Note that the first intermediate stream X′ transmitted from the last division matrix multiplier 60 may be discarded in the time evolution section 30 after being received by the time evolution section 30 .

Ｐ_ｒ３個の分割行列乗算部６０のそれぞれは、バッファ部６２と、分割行列メモリ６４と、実行部６６と、セレクタ６８とを有する。 Each of the _Pr3 split matrix multiplication units 60 has a buffer unit 62 , a split matrix memory 64 , an execution unit 66 and a selector 68 .

先頭の分割行列乗算部６０のバッファ部６２は、時間発展部３０から出力された第１中間ストリームＸ´を取得し、取得した第１中間ストリームＸ´を一定時間記憶して出力する。先頭以外の分割行列乗算部６０のバッファ部６２は、直前段の分割行列乗算部６０から出力された第１中間ストリームＸ´を取得し、取得した第１中間ストリームＸ´を一定時間記憶して出力する。 The buffer unit 62 of the division matrix multiplication unit 60 at the head acquires the first intermediate stream X' output from the time evolution unit 30, stores the acquired first intermediate stream X' for a certain period of time, and outputs it. The buffer units 62 of the division matrix multiplication units 60 other than the head obtain the first intermediate stream X' output from the division matrix multiplication unit 60 of the immediately preceding stage, and store the obtained first intermediate stream X' for a certain period of time. Output.

分割行列メモリ６４は、対応する分割行列に含まれる（Ｎ／Ｐ_ｒ３）行×Ｎ列の係数Ｊ_ｉ，ｊを記憶する。実行部６６は、バッファ部６２に記憶された第１中間ストリームＸ´および分割行列メモリ６４に記憶された分割行列に基づき、対応するブロックに含まれる（Ｎ／Ｐ_ｒ３）個の第２中間変数ｂ_ｉを算出する。 The partitioning matrix memory 64 stores coefficients J _i,j of (N/P _r3 ) rows×N columns included in the corresponding partitioning matrix. Based on the first intermediate stream X′ stored in the buffer unit 62 and the partitioning matrix stored in the partitioning matrix memory 64, the execution unit 66 extracts (N/P _r3 ) second intermediate variables included in the corresponding block. Calculate b _i .

ここで、実行部６６は、対応するブロックに含まれる（Ｎ／Ｐ_ｒ３）個の第２中間変数ｂ_ｉを、１クロックサイクルにＰ_ｒ１個の第２中間変数ｂ_ｉを含む第２中間ストリームＢとして出力する。Ｐ_ｒ１は、Ｎの約数である。 Here, the execution unit 66 converts the (N/P _r3 ) second intermediate variables b _i included in the corresponding block to a second intermediate stream containing P _r1 second intermediate variables b _i in one clock cycle. Output as B. _Pr1 is a divisor of N.

さらに、実行部６６は、他の分割行列乗算部６０に含まれる実行部６６とは異なるクロックサイクルにおいて、第２中間ストリームＢを出力する。従って、第２中間ストリームＢは、同一のクロックサイクルに、複数の分割行列乗算部６０から出力されることはない。 Furthermore, the execution unit 66 outputs the second intermediate stream B in a clock cycle different from that of the execution units 66 included in the other split matrix multiplication units 60 . Therefore, the second intermediate stream B is never output from a plurality of split matrix multipliers 60 in the same clock cycle.

先頭の分割行列乗算部６０のセレクタ６８は、先頭の分割行列乗算部６０の実行部６６が第２中間ストリームＢを出力したクロックサイクルにおいて、先頭の分割行列乗算部６０の実行部６６が出力した第２中間ストリームＢを選択して出力する。また、先頭の分割行列乗算部６０のセレクタ６８は、先頭の分割行列乗算部６０の実行部６６が第２中間ストリームＢを出力していないクロックサイクルにおいて、ダミーデータを選択して出力する。 The selector 68 of the top split matrix multiplication unit 60 outputs the second intermediate stream B from the execution unit 66 of the top split matrix multiplication unit 60 in the clock cycle in which the execution unit 66 of the top split matrix multiplication unit 60 outputs the second intermediate stream B. A second intermediate stream B is selected for output. Also, the selector 68 of the top split matrix multiplication unit 60 selects and outputs dummy data in the clock cycle when the execution unit 66 of the top split matrix multiplication unit 60 does not output the second intermediate stream B.

また、先頭を除いたｋ番目の分割行列乗算部６０のセレクタ６８は、ｋ番目の分割行列乗算部６０の実行部６６が第２中間ストリームＢを出力したクロックサイクルにおいて、ｋ番目の分割行列乗算部６０の実行部６６が出力した第２中間ストリームＢを選択して出力する。また、ｋ番目の分割行列乗算部６０の実行部６６が第２中間ストリームＢを出力していないクロックサイクルにおいて、前段の分割行列乗算部６０のセレクタ６８が出力した第２中間ストリームＢを選択して出力する。 In addition, the selector 68 of the k-th divided matrix multiplication unit 60 excluding the head performs the k-th divided matrix multiplication in the clock cycle in which the execution unit 66 of the k-th divided matrix multiplication unit 60 outputs the second intermediate stream B. The second intermediate stream B output by the execution unit 66 of the unit 60 is selected and output. Further, in a clock cycle in which the execution unit 66 of the k-th divided matrix multiplication unit 60 does not output the second intermediate stream B, the second intermediate stream B output by the selector 68 of the preceding divided matrix multiplication unit 60 is selected. output.

図１１は、第３実施形態に係る行列乗算部２８および時間発展部３０の実装例を示す図である。 FIG. 11 is a diagram showing an implementation example of the matrix multiplier 28 and the time evolution unit 30 according to the third embodiment.

第３実施形態に係るＰ_ｒ３個の行列乗算部２８および時間発展部３０は、例えば、それぞれが独立した半導体チップである、（Ｐ_ｒ３＋１）個のチップ７０－１～７０－（Ｐ_ｒ３＋１）に実装することができる。 The P _r 3 matrix multiplication units 28 and the time evolution unit 30 according to the third embodiment are, for example, (P _r3 +1) chips 70-1 to 70-(P _r3 +1), each of which is an independent semiconductor chip. ) can be implemented.

第１から第Ｐ_ｒ３のチップ７０－１～７０－Ｐ_ｒ３のそれぞれは、分割行列乗算部６０を含む。第（Ｐ_ｒ３＋１）のチップ７０－（Ｐ_ｒ３＋１）は、時間発展部３０を含む。 Each of the first to P _r3 -th chips 70 - 1 to 70 -P _r3 includes a split matrix multiplier 60 . The (P _r3 +1)th chip 70 −(P _r3 +1) includes the time evolution section 30 .

また、それぞれのチップ７０は、受信部７４および送信部７６を含む。受信部７４は、データ受信用のリンクポートである。送信部７６は、データ送信用のリンクポートである。受信部７４および送信部７６は、全二重通信をする送受信部であってもよい。また、それぞれのチップ７０は、さらなる通信ポートを含んでもよい。例えば、それぞれのチップ７０は、２つの独立な受信用リンクポートおよび２つの独立な送信用リンクポートを含んでもよい。 Each chip 70 also includes a receiver portion 74 and a transmitter portion 76 . The receiver 74 is a link port for data reception. The transmission unit 76 is a link port for data transmission. The receiving unit 74 and the transmitting unit 76 may be transmitting/receiving units that perform full-duplex communication. Each chip 70 may also include additional communication ports. For example, each chip 70 may include two independent receive link ports and two independent transmit link ports.

送信部７６は、１クロックサイクル分の第１中間変数ｘ_ｉ´および第２中間変数ｂ_ｉを合成して、合成ストリームとして出力する。第１中間ストリームＸ´のビット幅がＷｘ´であり、第２中間ストリームＢのビット幅がＷｂとした場合、合成ストリームは、Ｗｘ´＋Ｗｂ以上のビット幅を有する。 The transmission unit 76 synthesizes the first intermediate variable x _i ' and the second intermediate variable b _i for one clock cycle, and outputs a synthesized stream. If the bit width of the first intermediate stream X' is Wx' and the bit width of the second intermediate stream B is Wb, the composite stream has a bit width of Wx'+Wb or more.

受信部７４および送信部７６のそれぞれは、例えば、ＦＩＦＯメモリを含む。送信部７６は、第１中間ストリームＸ´および第２中間ストリームＢを合成して、ＦＩＦＯに書き込む。そして、送信部７６は、ＦＩＦＯに格納された合成ストリームを順次に送信する。また、受信部７４は、受信した合成ストリームをＦＩＦＯに書き込む。そして、受信部７４は、ＦＩＦＯから合成ストリームを順に読み出して、第１中間ストリームＸ´および第２中間ストリームＢに分離する。 Each of the receiver 74 and transmitter 76 includes, for example, a FIFO memory. The transmitter 76 combines the first intermediate stream X' and the second intermediate stream B and writes them into the FIFO. Then, the transmission unit 76 sequentially transmits the composite stream stored in the FIFO. Also, the receiving unit 74 writes the received composite stream into the FIFO. Then, the receiving unit 74 sequentially reads the combined stream from the FIFO and separates it into the first intermediate stream X' and the second intermediate stream B. FIG.

２つのチップ７０の間は、通信リンク７２を介して接続される。第１のチップ７０－１の出力端子は、第２のチップ７０－２の入力端子に、第１の通信リンク７２－１を介して接続される。第ｋのチップ７０－ｋの出力端子は、第（ｋ＋１）のチップ７０－（ｋ＋１）の入力端子に、第ｋの通信リンク７２－ｋを介して接続される。そして、第（Ｐ_ｒ３＋１）のチップ７０－（Ｐ_ｒ３＋１）の出力端子は、第１のチップ７０－１の入力端子に、第（Ｐ_ｒ３＋１）の通信リンク７２－（Ｐ_ｒ３＋１）を介して接続される。このように、（Ｐ_ｒ３＋１）個のチップ７０－１～７０－（Ｐ_ｒ３＋１）は、リングトポロジの態様で相互接続される。 A connection is made between the two chips 70 via a communication link 72 . Output terminals of the first chip 70-1 are connected to input terminals of the second chip 70-2 via a first communication link 72-1. The output terminal of the kth chip 70-k is connected to the input terminal of the (k+1)th chip 70-(k+1) via the kth communication link 72-k. Then, the output terminal of the (P _r3 +1)th chip 70-(P _r3 +1) is connected to the input terminal of the first chip 70-1 to the (P _r3 +1)th communication link 72-(P _r3 +1) connected via Thus, the (P _r3 +1) chips 70-1 to 70-(P _r3 +1) are interconnected in a ring topology manner.

例えば、チップ７０の入力端子および出力端子は、ＱＳＦＰ（Quad Small Form-factor Pluggable）ポートであってよい。また、通信リンク７２は、ＱＳＦＰ対応光学ケーブルまたはＱＳＦＰ対応メタルケーブルであってよい。また、２つのチップ７０の間の通信リンク７２は、高速シリアルリンク，イーサーネットリンクまたはｐｅｅｒ－ｔｏ－ｐｅｅｒリンクであってもよい。 For example, the input and output terminals of chip 70 may be QSFP (Quad Small Form-factor Pluggable) ports. Also, the communication link 72 may be a QSFP compliant optical cable or a QSFP compliant metal cable. Also, the communication link 72 between the two chips 70 may be a high speed serial link, an Ethernet link or a peer-to-peer link.

なお、本実施形態では、それぞれのチップ７０は、合成ストリームを送受信しているが、第１中間ストリームＸ´および第２中間ストリームＢをそれぞれ別個に送受信してもよい。この場合、（Ｐ_ｒ３＋１）個のチップ７０－１～７０－（Ｐ_ｒ３＋１）は、２系統のリングトポロジの態様で相互接続される。 In this embodiment, each chip 70 transmits/receives a composite stream, but the first intermediate stream X' and the second intermediate stream B may be transmitted/received separately. In this case, (P _r3 +1) chips 70-1 to 70-(P _r3 +1) are interconnected in a two-system ring topology.

このような第３実施形態に係る計算装置１０は、最適化問題により決定すべき第１変数ｘ_ｉの個数が変更しても、分割行列乗算部６０の個数を変更することにより、対応することができる。従って、計算装置１０は、最適化問題により決定すべき第１変数ｘ_ｉの個数が大幅に増加した場合、例えば、同一の構成のチップ７０の数を増加させれば、対応することができる。また、計算装置１０は、複数の分割行列乗算部６０により並列的に行列演算が実行されるので、最適化問題を短時間で完了させることができる。 Even if the number of the first variables x _i to be determined by the optimization problem is changed, the computing device 10 according to the third embodiment can cope with the change by changing the number of the partitioned matrix multipliers 60. can be done. Therefore, if the number of first variables x _i to be determined by the optimization problem increases significantly, the computing device 10 can cope with the problem by increasing the number of chips 70 having the same configuration, for example. Further, in the computing device 10, matrix operations are executed in parallel by a plurality of divided matrix multiplication units 60, so the optimization problem can be completed in a short period of time.

（第４実施形態）
第４実施形態に係る計算装置１０について説明する。 (Fourth embodiment)
A computing device 10 according to the fourth embodiment will be described.

図１２は、第４実施形態でのＮ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉの関係を示す図である。第４実施形態において、Ｎ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉは、第３実施形態と同様の関係に加えて、さらに、図１２に示すような関係がある。 FIG. 12 is a diagram showing the relationship among the N first intermediate variables x _i ', the coefficient matrix J, and the N second intermediate variables b _i in the fourth embodiment. In the fourth embodiment, the N first intermediate variables x _i ', the coefficient matrix J and the N second intermediate variables b _i have the same relationship as in the third embodiment, and furthermore, shown in FIG. There is a relationship like

Ｎ個の第１中間変数ｘ_ｉ´は、Ｎ_ｓ個のデータセットに分割される。Ｎ_ｓ個のデータセットのそれぞれは、Ｐ_ｃ個の第１中間変数ｘ_ｉ´を含む。Ｎ_ｓおよびＰ_ｃは、Ｎの約数である。例えば、Ｎ個の第１中間変数ｘ_ｉ´は、ｘ´ｂ０、ｘ´ｂ１、…、ｘ´ｂＮｓのＮ_ｓ個のデータセットに分割される。 The N first intermediate variables x _i ' are divided into N _s data sets. Each of the N _s data sets contains P _c first intermediate variables x _i ′. N _s and P _c are divisors of N. For example, N first intermediate variables x _i ' are divided into N _s data sets of x'b0, x'b1, ..., x'bNs.

Ｐ_ｒ３個の分割行列のそれぞれは、Ｐ_ｒ２個のサブ行列に分割される。Ｐ_ｒ２個のサブ行列のそれぞれは、Ｐ_ｒ１行×Ｎ列の係数Ｊ_ｉ，ｊを含む。例えば、ＪＧ０の分割行列は、ｊｂ０、ｊｂ１、…、ｊｂｓ、…、ｊｂＰｒ２－１のＰ_ｒ２個のサブ行列に分割される。 Each of the P _r3 partitioned matrices is partitioned into P _r2 sub-matrices. Each of the P _r2 sub-matrices contains P _r1 rows by N columns of coefficients J _i,j . For example, the partitioning matrix of JG0 is partitioned into _Pr2 sub-matrices jb0, jb1, . . . , jbs, .

Ｐ_ｒ１およびＰ_ｒ２は、Ｎの約数であり、Ｐ_ｒ１×Ｐ_ｒ２×Ｐ_ｒ３は、Ｎとなる。ｓは、０からＰ_ｒ２－１までの任意の整数である。 P _r1 and P _r2 are divisors of N, and P _r1 ×P _r2 ×P _r3 is N. s is any integer from 0 to P _r2 −1.

Ｎ個の第２中間変数ｂ_ｉに含まれるＰ_ｒ３個のブロックのそれぞれは、Ｐ_ｒ２個のサブブロックに分割される。Ｐ_ｒ２個のサブブロックのそれぞれは、Ｐ_ｒ１個の第２中間変数ｂ_ｉを含む。例えば、ＢＧ０のブロックは、ｂｂ０、ｂｂ１、…、ｂｂｓ、…、ｂｂＰｒ２－１のＰ_ｒ２個のサブブロックに分割される。 Each of the _Pr3 blocks contained in the N second intermediate variables b _i is divided into _Pr2 sub-blocks. Each of the P _r2 sub-blocks contains P _r1 second intermediate variables b _i . For example, a block of BG0 is divided into _Pr2 sub-blocks of bb0, bb1, . . . bbs, .

そして、ｋ番目のブロックに含まれるＰ_ｒ２個のサブブロックは、ｋ番目の分割行列に含まれるＰ_ｒ２個のサブ行列に一対一で対応付けられる。例えば、ｋ番目のブロックＢＧｋに含まれるｓ番目のサブブロックは、ｋ番目の分割行列ＪＧｋに含まれるｓ番目のサブ行列ｊｂｓに対応する。 The _Pr2 sub-blocks included in the k-th block are in one-to-one correspondence with the _Pr2 sub-matrices included in the k-th partition matrix. For example, the s-th sub-block included in the k-th block BGk corresponds to the s-th sub-matrix jbs included in the k-th partition matrix JGk.

第４実施形態において、分割行列乗算部６０は、対応する分割行列に含まれるサブ行列毎に、Ｎ個の第１中間変数ｘ_ｉ´との行列乗算を実行する。例えば、ｋ番目の分割行列乗算部６０は、Ｎ個の第１中間変数ｘ_ｉ´と、ｋ番目の分割行列ＪＧｋに含まれるＰ_ｒ２個のサブ行列のそれぞれとを別個に行列乗算をすることにより、ｋ番目のブロックＢＧｋに含まれるＰ_ｒ２個のサブブロックを算出する。 In the fourth embodiment, the split matrix multiplication unit 60 performs matrix multiplication with the N first intermediate variables x _i ' for each sub-matrix included in the corresponding split matrix. For example, the k-th division matrix multiplication unit 60 separately performs matrix multiplication between the N first intermediate variables x _i ' and each of the _Pr2 sub-matrices included in the k-th division matrix JGk. _Pr2 sub-blocks included in the k-th block BGk are calculated.

図１３は、第４実施形態に係る分割行列乗算部６０の構成を示す図である。分割行列乗算部６０に含まれる実行部６６は、Ｐ_ｒ２個のサブ行列乗算部８０と、多重化部８２とを含む。 FIG. 13 is a diagram showing the configuration of the split matrix multiplier 60 according to the fourth embodiment. The execution unit 66 included in the split matrix multiplication unit 60 includes a _Pr2 sub-matrix multiplication unit 80 and a multiplexing unit 82 .

Ｐ_ｒ２個のサブ行列乗算部８０は、対応する分割行列に含まれるＰ_ｒ２個のサブ行列に一対一で対応付けられる。例えば、ｋ番目の分割行列に対応付けられた分割行列乗算部６０に含まれるＰ_ｒ２個のサブ行列乗算部８０は、ｋ番目の分割行列に含まれるＰ_ｒ２個のサブ行列に一対一で対応付けられる。 The P _r 2 sub-matrix multiplication units 80 are associated one-to-one with the P _{r 2} sub-matrices included in the corresponding divided matrix. For example, the _Pr2 sub-matrix multipliers 80 included in the partitioned matrix multiplier 60 associated with the k-th partitioned matrix correspond to the _Pr2 sub-matrices included in the k-th partitioned matrix on a one-to-one basis. Attached.

第ｋ番目の分割行列乗算部６０に含まれるＰ_ｒ２個のサブ行列乗算部８０のそれぞれは、Ｎ個の第１中間変数ｘ_ｉ´と、対応するサブ行列とを行列乗算することにより、対応するサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを算出する。 Each of the two P _r sub-matrix multiplication units 80 included in the k-th split matrix multiplication unit 60 performs matrix multiplication of the N first intermediate variables x _i ' and the corresponding sub-matrices to obtain the corresponding _Pr1 second intermediate variables b _i included in the sub-block to be calculated.

また、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを１クロックサイクルで並列に出力する。例えば、ｋ番目の分割行列乗算部６０に含まれるｓ番目のサブ行列乗算部８０は、ｋ番目のブロックに含まれるｓ番目のサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを、同一のクロックサイクルに出力する。 Also, each of the _Pr2 sub-matrix multiplication units 80 outputs the _Pr1 second intermediate variables b _i included in the corresponding sub-block in parallel in one clock cycle. For example, the s-th sub-matrix multiplication unit 80 included in the k-th split matrix multiplication unit 60 converts _Pr1 second intermediate variables b _i included in the s-th sub-block included in the k-th block to output on the same clock cycle.

さらに、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、他のサブ行列乗算部８０とは異なるクロックサイクルにおいて、Ｐ_ｒ１個の第２中間変数ｂ_ｉを出力する。つまり、第２中間変数ｂ_ｉは、同一のクロックサイクルに、複数のサブ行列乗算部８０から出力されることはない。 Furthermore, each of the _Pr2 sub-matrix multipliers 80 outputs the _Pr1 second intermediate variables b _i in a different clock cycle than the other sub-matrix multipliers 80 . In other words, the second intermediate variable b _i is never output from a plurality of sub-matrix multipliers 80 in the same clock cycle.

多重化部８２は、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれから出力されたＰ_ｒ１個の第２中間変数ｂ_ｉのセットを多重化することにより、１クロックサイクルにＰ_ｒ１個の第２中間変数ｂ_ｉを含む第２中間ストリームＢを生成する。例えば、ｋ番目の分割行列乗算部６０に含まれる多重化部８２は、ｋ番目のブロックに含まれるＰ_ｒ２個のサブブロックを含む第２中間ストリームＢを、Ｐ_ｒ２クロックサイクルで出力する。 The multiplexing unit 82 multiplexes the set of _Pr1 second intermediate variables b _i output from each of the _Pr2 sub-matrix multipliers 80, thereby generating _Pr1 second intermediate variables b i in one clock cycle. Generate a second intermediate stream B containing intermediate variables b _i . For example, the multiplexer 82 included in the k-th split matrix multiplier 60 outputs the second intermediate stream B including _Pr2 sub-blocks included in the k-th block in _Pr2 clock cycles.

分割行列乗算部６０に含まれるバッファ部６２は、シフトレジスタとして機能するＰ_ｒ２段のレジスタ８４と、バッファ内受信部８６と、バッファ内送信部８８とを含む。 The buffer unit 62 included in the partitioned matrix multiplying unit 60 includes a Pr _2- stage register 84 functioning as a shift register, an intra-buffer receiving unit 86 and an intra-buffer transmitting unit 88 .

Ｐ_ｒ２段のレジスタ８４は、Ｐ_ｒ２個のサブ行列乗算部８０に一対一に対応付けられる。Ｐ_ｒ２段のレジスタ８４のそれぞれは、Ｐ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを１クロックサイクル記憶する。Ｐ_ｒ２段のレジスタ８４のそれぞれは、次のクロックサイクルにおいて、記憶しているＰ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを次段のレジスタ８４に並列に転送する。 The Pr _2- stage registers 84 are associated one-to-one with the Pr ₂ sub-matrix multipliers 80 . Each of the P _{r two-} stage registers 84 stores a data set containing P _c first intermediate variables x _i ′ for one clock cycle. Each of the P _{r two} -stage registers 84 transfers the stored data set containing the P _c first intermediate variables x _i ′ in parallel to the next-stage register 84 in the next clock cycle.

バッファ内受信部８６は、第１中間ストリームＸ´を受信し、受信した第１中間ストリームＸ´をＰ_ｃ個のワード幅のストリームに変換する。そして、バッファ内受信部８６は、１クロックサイクル毎に、Ｐ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを先頭のレジスタ８４に書き込む。 An in-buffer receiver 86 receives the first intermediate stream X' and converts the received first intermediate stream X' into a stream of _Pc words wide. Then, the intra-buffer receiving unit 86 writes a data set including P _c first intermediate variables x _i ′ to the head register 84 every clock cycle.

バッファ内送信部８８は、１クロックサイクル毎に、最終段のレジスタ８４からＰ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを読み出し、第１中間ストリームＸ´に変換する。そして、バッファ内送信部８８は、第１中間ストリームＸ´を送信する。 The intra-buffer transmission unit 88 reads out a data set including P _c first intermediate variables x _i ′ from the last-stage register 84 every clock cycle, and converts it into a first intermediate stream X′. Then, the intra-buffer transmission unit 88 transmits the first intermediate stream X'.

Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するレジスタ８４に格納されているＰ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを１クロックサイクル毎に読み出す。Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、１クロックサイクル毎に、読み出したＰ_ｃ個の第１中間変数ｘ_ｉ´のそれぞれと、対応するサブ行列における対応する列の係数Ｊ_ｉ，ｊと乗算する。そして、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するサブ行列に含まれる行毎に、第１中間変数ｘ_ｉ´と係数Ｊ_ｉ，ｊとの乗算結果を累積加算する。これにより、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを算出することができる。 Each of the P _r 2 sub-matrix multipliers 80 reads out the data set containing the P _c first intermediate variables x _i ' stored in the corresponding register 84 every clock cycle. Each of the two P _r sub-matrix multiplication units 80 each of the read P _c first intermediate variables x _i ′ and the coefficient J _i,j of the corresponding column in the corresponding sub-matrix each clock cycle. Multiply with Each of the two P _r sub-matrix multiplication units 80 cumulatively adds the multiplication results of the first intermediate variable x _i ' and the coefficient J _i,j for each row included in the corresponding sub-matrix. As a result, each of the _Pr2 sub-matrix multipliers 80 can calculate the _Pr1 second intermediate variables b _i included in the corresponding sub-block.

ここで、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するレジスタ８４に、Ｎ個の第１中間変数ｘ_ｉ´のうちの先頭の第１中間変数ｘ_０´から末尾の第１中間変数ｘ_Ｎ－１´が格納される前の期間、乗算および累積加算を実行する。そして、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、対応するレジスタ８４に末尾の第１中間変数ｘ_Ｎ－１´が格納されたクロックサイクルから所定数のクロックサイクル経過後に、対応するサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを出力する。従って、Ｐ_ｒ２個のサブ行列乗算部８０のそれぞれは、互いに異なるクロックサイクルに、対応するサブブロックに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉを出力することができる。 Here, each of the _Pr2 sub-matrix multiplication units 80 stores the first intermediate variable x 0 ′ to the last intermediate variable x ₀ ′ among the N first intermediate variables x _i ′ in the corresponding register 84 . During the period before the variable x _N-1 ' is stored, multiplication and cumulative addition are performed. Then, each of the P _r2 sub-matrix multiplication units 80 performs the corresponding sub-block after a predetermined number of clock cycles have passed since the clock cycle when the last intermediate variable x _N−1 ′ was stored in the corresponding register 84. output _Pr1 second intermediate variables b _i contained in . Therefore, each of the _Pr2 sub-matrix multipliers 80 can output the _Pr1 second intermediate variables b _i included in the corresponding sub-block in different clock cycles.

このような構成の第４実施形態に係る計算装置１０は、Ｐ_ｒ３×Ｐ_ｒ２の並列度で行列乗算をすることができる。これにより、第４実施形態に係る計算装置１０によれば、行列乗算を高速に実行することができる。 The computing device 10 having such a configuration according to the fourth embodiment can perform matrix multiplication with a degree of parallelism of P _r3 ×P _r2 . As a result, the computing device 10 according to the fourth embodiment can execute matrix multiplication at high speed.

（第５実施形態）
第５実施形態に係る計算装置１０について説明する。 (Fifth embodiment)
A computing device 10 according to the fifth embodiment will be described.

図１４は、第５実施形態でのＮ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉの関係を示す図である。第５実施形態において、Ｎ個の第１中間変数ｘ_ｉ´、係数行列ＪおよびＮ個の第２中間変数ｂ_ｉは、第４実施形態と同様の関係に加えて、さらに、図１４に示すような関係がある。 FIG. 14 is a diagram showing the relationship among the N first intermediate variables x _i ', the coefficient matrix J, and the N second intermediate variables b _i in the fifth embodiment. In the fifth embodiment, the N first intermediate variables x _i ', the coefficient matrix J and the N second intermediate variables b _i have the same relationship as in the fourth embodiment, and furthermore, shown in FIG. There is a relationship like

Ｐ_ｒ３個の分割行列のそれぞれは、Ｐ_ｒ２個のサブ行列に分割される。さらに、Ｐ_ｒ２個のサブ行列のそれぞれは、Ｐ_ｃ個の列単位で、Ｎ_ｓ個の係数セットに分割される。それぞれの係数セットは、Ｐ_ｒ１行×Ｐ_ｃ列の係数Ｊ_ｉ，ｊを含む。例えば、ＪＧ０の分割行列に含まれるｊｂ０のサブ行列は、ｊｂ０（０）、ｊｂ０（１）、…、ｊｂ０（Ｎ_ｓ－１）のＮ_ｓ個の係数セットに分割される。 Each of the P _r3 partitioned matrices is partitioned into P _r2 sub-matrices. Further, each of the P _r2 sub-matrices is divided into N _s coefficient sets by P _c columns. Each coefficient set contains _Pr1 rows by _Pc columns of coefficients J _i,j . For example, the sub-matrix of jb0 contained in the partitioning matrix of JG0 is partitioned into N _s coefficient sets jb0(0), jb0(1), . . . , jb0(N _s −1).

また、Ｎ_ｓ個の係数セットは、Ｎ個の第１中間変数ｘ_ｉ´をＰ_ｃ個毎に分割したＮ_ｓ個のデータセットに一対一に対応する。 Also, the N _s coefficient sets correspond one-to-one to the N _s data sets obtained by dividing the N first intermediate variables x _i ′ into P _c pieces.

第５実施形態において、サブ行列乗算部８０は、１つの係数セットに含まれるＰ_ｒ１個の行のそれぞれについて、Ｐ_ｃ個の係数Ｊ_ｉ，ｊと対応するＰ_ｃ個の第１中間変数ｘ_ｉ´との乗累算値を算出する。そして、サブ行列乗算部８０は、Ｐ_ｒ１個の行毎に、Ｎ_ｓ個の係数セットのそれぞれについて算出した全ての乗累算値を加算する。これにより、サブ行列乗算部８０は、Ｐ_ｒ１個の行のそれぞれについて、Ｎ個の係数Ｊ_ｉ，ｊとＮ個の第１中間変数ｘ_ｉ´とを乗累算することができる。 In the fifth embodiment, the sub-matrix multiplication unit 80 _calculates P c first intermediate variables x corresponding to P _c coefficients J _i,j for each of P _r1 rows included in one coefficient set. A multiplication accumulation value with _i ' is calculated. The sub-matrix multiplication unit 80 then adds all multiplication-accumulated values calculated for each of the N _s coefficient sets for each _Pr1 row. Thus, the sub-matrix multiplier 80 can multiply and accumulate the N coefficients J _i,j and the N first intermediate variables x _i ' for each of the _Pr1 rows.

図１５は、第６実施形態に係るサブ行列乗算部８０の構成を、対応するレジスタ８４とともに示す図である。それぞれのサブ行列乗算部８０は、Ｐ_ｒ１個の乗累算部９０を含む。 FIG. 15 is a diagram showing the configuration of the sub-matrix multiplier 80 according to the sixth embodiment together with the corresponding register 84. As shown in FIG. Each sub-matrix multiplier 80 includes _Pr1 multiply-accumulate units 90 .

Ｐ_ｒ１個の乗累算部９０は、対応するサブ行列に含まれるＰ_ｒ１個の行に一対一で対応付けられる。例えば、ｋ番目の分割行列に対応付けられた分割行列乗算部６０に含まれるｓ番目のサブ行列に対応付けられたＰ_ｒ１個の乗累算部９０は、ｋ番目の分割行列に含まれるｓ番目のサブ行列に含まれるＰ_ｒ１個の行に一対一で対応付けられる。 The _Pr1 multiply-accumulate units 90 are associated one-to-one with the _Pr1 rows contained in the corresponding sub-matrices. For example, the _Pr1 multiply-accumulate unit 90 associated with the s-th sub-matrix included in the partitioned matrix multiplier 60 associated with the k-th partitioned matrix is s _Pr1 rows included in the th sub-matrix are associated one-to-one.

Ｐ_ｒ１個の乗累算部９０は、全て同一の構成である。Ｐ_ｒ１個の乗累算部９０のそれぞれは、係数行列Ｊにおける対応する行と、Ｎ個の第１中間変数ｘ_ｉ´とを乗累算することにより、対応するサブブロックに含まれる対応する位置の第２中間変数ｂ_ｉを算出する。Ｐ_ｒ１個の乗累算部９０は、第２中間変数ｂ_ｉを並行に算出して、同一のクロックサイクルに出力する。 All of the _Pr1 multiply-accumulate units 90 have the same configuration. Each of the _Pr1 multiply-accumulate units 90 multiply-accumulates the corresponding row in the coefficient matrix J with the N first intermediate variables x _i ' to obtain the corresponding Calculate a second intermediate variable b _i of the position. The _Pr1 multiply-accumulate units 90 compute the second intermediate variables _bi in parallel and output them in the same clock cycle.

より詳しくは、Ｐ_ｒ１個の乗累算部９０のそれぞれは、Ｐ_ｒ２段のレジスタ８４のうちの対応するレジスタ８４に順次に記憶される、第１中間ストリームＸ´の先頭の第１中間変数ｘ_０´から末尾のＮ個の第１中間変数ｘ_Ｎ－１´までを、クロックサイクル毎に取得する。この場合、Ｐ_ｒ１個の乗累算部９０のそれぞれは、１クロックサイクルに、Ｐ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットを取得する。そして、Ｐ_ｒ１個の乗累算部９０のそれぞれは、データセットの取得をＮ_ｓ（Ｎ_ｓ＝Ｎ／Ｐ_ｃ）クロックサイクルに渡って実行することにより、Ｎ_ｓ個のデータセットを取得する。また、Ｐ_ｒ１個の乗累算部９０のうちのｐ番目（ｐは、０からＰ_ｒ１－１までの任意の整数）の乗累算部９０は、１クロックサイクルに、取得したデータセットに対応する係数セットに含まれる、ｐ番目に対応する行に含まれるＰ_ｃ個の係数（（ｊｐ，ｓ０）～（ｊｐ，ＳＰｃ－１））を取得する。そして、Ｐ_ｒ１個の乗累算部９０のそれぞれは、係数セットに含まれる対応するＰ_ｃ個の係数を、Ｎ_ｓクロックサイクルに渡って取得する。Ｐ_ｒ１個の乗累算部９０のそれぞれは、１クロックサイクル毎に、Ｐ_ｃ個の第１中間変数ｘ_ｉ´と、対応する行に含まれる対応するＰ_ｃ個の係数Ｊ_ｉ，ｊとを乗累算する。そして、Ｐ_ｒ１個の乗累算部９０のそれぞれは、乗累算結果を、Ｎ_ｓクロックサイクルに渡って累積加算することにより、第２中間変数ｂ_ｉを算出する。 More specifically, each of the _Pr1 multiply-accumulate units 90 stores the first intermediate variable at the beginning of the first intermediate stream X′ sequentially in the corresponding register 84 of the _Pr2- stage registers 84. From x ₀ ' to the last N first intermediate variables x _N-1 ' are obtained every clock cycle. In this case, each of the P _r1 multiply-accumulate units 90 acquires a data set containing P _c first intermediate variables x _i ′ in one clock cycle. Then, each of the P _r1 multiply-accumulate units 90 acquires N s data sets by executing the acquisition of data sets over N _s (N _s =N/P _c ₎ clock cycles. . In addition, the p-th (p is an arbitrary integer from 0 to P _r1 −1) of the P _r1 multiply-accumulate units 90 converts the obtained data set into Obtain P _c coefficients ((jp, s0) to (jp, SPc-1)) contained in the p-th corresponding row contained in the corresponding coefficient set. Each of the P _r1 multiply-accumulate units 90 then acquires the corresponding P _c coefficients in the coefficient set over N _s clock cycles. Each of the _Pr1 multiply-accumulate units 90 generates _Pc first intermediate variables x _i ' and corresponding _Pc coefficients J _i,j contained in the corresponding row at each clock cycle. multiply and accumulate. Then, each of the _Pr1 multiply-accumulate units 90 calculates the second intermediate variable _bi by cumulatively adding the multiply-accumulate results over _Ns clock cycles.

図１６は、分割行列メモリ６４に記憶された分割行列を示す図である。分割行列メモリ６４は、分割行列をＰ_ｒ２個のサブ行列に分割して記憶する。Ｐ_ｒ２個のサブ行列のそれぞれは、対応するサブ行列乗算部８０へと並列に出力される。 FIG. 16 is a diagram showing the partitioning matrix stored in the partitioning matrix memory 64. As shown in FIG. The divided matrix memory 64 divides the divided matrix into _Pr2 sub-matrices and stores them. Each of the _Pr2 sub-matrices is output in parallel to the corresponding sub-matrix multiplier 80 .

Ｐ_ｒ２個のサブ行列のそれぞれは、Ｎ_ｓ個の係数セットを含む。Ｎ_ｓ個の係数セットのそれぞれは、Ｐ_ｃ×Ｐ_ｒ１個の係数を含む。分割行列メモリ６４は、Ｐ_ｒ２個のサブ行列のそれぞれについて、１クロックサイクルにおいて、１つの係数セットに含まれるＰ_ｃ×Ｐ_ｒ１個の係数を並列に出力することが可能となっている。 Each of the P _r2 sub-matrices contains N _s coefficient sets. Each of the N _s coefficient sets contains P _c ×P _r1 coefficients. The split matrix memory 64 is capable of outputting in parallel P _c ×P _r1 coefficients included in one coefficient set for each of the P _r2 sub-matrices in one clock cycle.

図１７は、１つのサブ行列乗算部８０に含まれるＰ_ｒ１個の乗累算部９０と、そのサブ行列乗算部８０に送信されるサブ行列を示す図である。 FIG. 17 is a diagram showing _Pr1 multiply-accumulate units 90 included in one sub-matrix multiplying unit 80 and the sub-matrices transmitted to the sub-matrix multiplying units 80 .

１つのサブ行列乗算部８０に含まれるＰ_ｒ１個の乗累算部９０のそれぞれには、１クロックサイクル毎に、対応するレジスタ８４に格納されたＰ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットがブロードキャストされる。 Each of the P _r1 multiplication/accumulation units 90 included in one sub-matrix multiplication unit 80 receives P _c first intermediate variables x _i ' stored in the corresponding register 84 every clock cycle. The containing dataset is broadcast.

分割行列メモリ６４は、それぞれのサブ行列について、Ｎ_ｓ個の係数セットを含む。分割行列メモリ６４は、１クロックサイクル毎に、それぞれのサブ行列に含まれる１つの係数セットが、ポインタにより特定される。より具体的には、分割行列メモリ６４は、そのサブ行列に対応するレジスタ８４に格納されたデータセットに対応する係数セットが、特定される。 Partitioned matrix memory 64 contains N _s coefficient sets for each sub-matrix. In the divided matrix memory 64, one coefficient set included in each sub-matrix is specified by a pointer every clock cycle. More specifically, partitioned matrix memory 64 identifies the coefficient set corresponding to the data set stored in register 84 corresponding to that sub-matrix.

分割行列メモリ６４は、クロックサイクル毎に、ポインタにより特定された係数セットに含まれるＰ_ｃ×Ｐ_ｒ１個の係数を並列に出力する。係数セットに含まれるＰ_ｃ×Ｐ_ｒ１個の係数のそれぞれは、出力先の乗累算部９０が予め定められている。分割行列メモリ６４は、１つの係数セットに含まれるＰ_ｃ×Ｐ_ｒ１個の係数のそれぞれを、予め定められた乗累算部９０へと出力する。例えば、係数セットにおける先頭からＰ_ｃ番目までのＰ_ｃ個の係数は、１番目の乗累算部９０に出力される。また、例えば、（Ｐ_ｃ＋１）番目から（２×Ｐ_ｃ）個目までのＰ_ｃ個の係数は、２番目の乗累算部９０に出力される。また、（Ｐ_ｒ１－Ｐ_ｃ）番目からＰ_ｒ１番目までのＰ_ｃ個の係数は、Ｐ_ｒ１番目の乗累算部９０に出力される。 The split matrix memory 64 outputs in parallel P _c ×P _r1 coefficients included in the coefficient set specified by the pointer every clock cycle. Each of the P _c ×P _r1 coefficients included in the coefficient set has a predetermined multiplication-accumulation unit 90 as an output destination. The split matrix memory 64 outputs each of P _c ×P _r1 coefficients included in one coefficient set to a predetermined multiply-accumulate unit 90 . For example, P _c coefficients from the top to the P _c th coefficient in the coefficient set are output to the first multiply-accumulate unit 90 . Also, for example, P _c coefficients from (P _c +1) to (2×P _c ) are output to the second multiply-accumulate unit 90 . Also, the P _c coefficients from the (P _r1 −P _c )-th to the P _r1 -th are output to the P _r1 -th multiply-accumulate unit 90 .

図１８は、乗累算部９０の構成を示す図である。乗累算部９０は、Ｐ_ｃ個の乗算器９２と、加算器９４と、累加算器９６とを含む。乗累算部９０は、クロックサイクル毎に、Ｐ_ｃ個の第１中間変数ｘ_ｉ´を含むデータセットと、Ｐ_ｃ個の係数Ｊ_ｉ，ｊを含む係数セットとを取得する。 FIG. 18 is a diagram showing the configuration of the multiply-accumulate unit 90. As shown in FIG. Multiply-accumulate unit 90 includes P _c multipliers 92 , an adder 94 , and an accumulator 96 . The multiply-accumulate unit 90 acquires a data set containing P _c first intermediate variables x _i ′ and a coefficient set containing P _c coefficients J _i,j for each clock cycle.

データセットに含まれるＰ_ｃ個の第１中間変数ｘ_ｉ´と、係数セットに含まれるＰ_ｃ個の係数Ｊ_ｉ，ｊとは、一対一で対応する。また、Ｐ_ｃ個の乗算器９２は、データセットに含まれるＰ_ｃ個の第１中間変数ｘ_ｉ´、および、係数セットに含まれるＰ_ｃ個の係数Ｊ_ｉ，ｊに一対一で対応する。Ｐ_ｃ個の乗算器９２のそれぞれは、データセットに含まれる対応する１つの第１中間変数ｘ_ｉ´と、係数セットに含まれる対応する１つの係数Ｊ_ｉ，ｊとを乗算することにより、乗算値を出力する。Ｐ_ｃ個の乗算器９２のそれぞれは、クロックサイクル毎に、乗算値を出力する。 There is a one-to-one correspondence between the P _c first intermediate variables x _i ' included in the data set and the P _c coefficients J _i,j included in the coefficient set. Also, the P _c multipliers 92 correspond one-to-one to the P _c first intermediate variables x _i ' included in the data set and the P _c coefficients J _i,j included in the coefficient set. . Each of the P _c multipliers 92 multiplies a corresponding one first intermediate variable x _i ' contained in the data set with a corresponding one coefficient J _i,j contained in the coefficient set to Output the multiplied value. Each of the P _c multipliers 92 outputs a multiplied value every clock cycle.

加算器９４は、クロックサイクル毎に、Ｐ_ｃ個の乗算器９２から出力されたＰ_ｃ個の乗算値を加算することにより、乗累算値を算出する。累加算器９６は、クロックサイクル毎に、加算器９４から出力された乗累算値を累積加算する。また、累加算器９６は、対応するレジスタ８４に、第１中間ストリームＸ´における先頭の第１中間変数ｘ_０´が格納されたタイミング（Ｓ０＝０）において、累積加算値を０にリセットする。 The adder 94 calculates a multiplication-accumulated value by adding the P _c multiplied values output from the P _c multipliers 92 in each clock cycle. The accumulator 96 accumulatively adds the multiplication-accumulated values output from the adder 94 every clock cycle. Further, the accumulator 96 resets the cumulative addition value to 0 at the timing (S0=0) at which the leading first intermediate variable x ₀ ′ in the first intermediate stream X′ is stored in the corresponding register 84 . .

このような構成の乗累算部９０は、Ｐ_ｒ２段のレジスタ８４のうちの対応するレジスタ８４に順次に記憶される、第１中間ストリームＸ´の先頭の第１中間変数ｘ_０´から末尾の第１中間変数ｘ_Ｎ－１´までのＮ個の第１中間変数ｘ_ｉ´を、クロックサイクル毎に取得する。さらに、乗累算部９０は、取得したそれぞれの第１中間変数ｘ_ｉ´と、係数行列Ｊにおける対応する行に含まれる、取得した第１中間変数ｘ_ｉ´に対応する列の係数Ｊ_ｉ，ｊとを乗算する。そして、乗累算部９０は、第１中間変数ｘ_ｉ´と係数Ｊ_ｉ，ｊとの乗算結果を、第１中間ストリームＸ´の先頭の第１中間変数ｘ_０´から末尾の第１中間変数ｘ_Ｎ－１´まで累積加算する。このような処理により、乗累算部９０は、対応するサブブロックに含まれる対応する位置の第２中間変数ｂ_ｉを算出することができる。 The multiply-accumulate unit 90 having such a configuration sequentially stores the first intermediate variable x 0 ′ to the last intermediate variable x ₀ ′ of the first intermediate stream X′, which are sequentially stored in the corresponding registers 84 of the Pr _2- stage registers 84 . N first intermediate variables x _i ' up to the first intermediate variable x _N-1 ' of are acquired every clock cycle. Further, the multiply-accumulate unit 90 calculates each of the obtained first intermediate variables x _i ' and the coefficients J i of the columns corresponding to the obtained first intermediate variables x _i ' included in the corresponding rows in the coefficient matrix _{J , j} . Then, the multiply-accumulate unit 90 multiplies the result of multiplication between the first intermediate variable x _i ' and the coefficient J _i,j from the first intermediate variable x ₀ ′ at the beginning of the first intermediate stream X′ to the first intermediate variable at the end. Cumulatively add up to the variable x _N-1 '. Through such processing, the multiply-accumulate unit 90 can calculate the second intermediate variable _bi at the corresponding position included in the corresponding sub-block.

図１９は、第５実施形態におけるパラメータの具体的な値の一例、および、処理タイミングの一例を示す図である。 FIG. 19 is a diagram showing an example of specific values of parameters and an example of processing timings in the fifth embodiment.

図１９の例において、Ｎ＝１６３８４である。また、図１９の例において、Ｐ_ｃ＝４、Ｎ_Ｓ＝４０９６である。また、図１９の例において、Ｐ_ｒ１＝８、Ｐ_ｒ２＝２５６、Ｐ_ｒ３＝８である。また、図１９において、ｔ０、ｔ１、ｔ２、…、ｔ６１４３は、クロックサイクルの順序を表す。 In the example of FIG. 19, N=16384. Also, in the example of FIG. 19, P _c =4 and N _S =4096. Also, in the example of FIG. 19, P _r1 =8, P _r2 =256, and P _r3 =8. 19, t0, t1, t2, . . . , t6143 represent the order of clock cycles.

Ｎ個の第１中間変数ｘ_ｉ´における先頭のデータセットは、ｔ０に、先頭の行列乗算部２８における先頭のレジスタ８４に格納される。従って、ＪＧ０の分割行列におけるｊｂ０のサブ行列における先頭の係数セットは、ｔ０に読み出される。 The head data set in the N first intermediate variables x _i ' is stored in the head register 84 in the head matrix multiplier 28 at t0. Therefore, the leading coefficient set in the jb0 sub-matrix in the JG0 partition matrix is read at t0.

また、Ｎ個の第１中間変数ｘ_ｉ´における末尾のデータセットは、ｔ４０９５に、先頭の行列乗算部２８における先頭のレジスタ８４に格納される。従って、ＪＧ０の分割行列におけるｊｂ０のサブ行列における末尾の係数セットは、ｔ４０９５に読み出される。 The data set at the end of the N first intermediate variables x _i ' is stored in the head register 84 in the head matrix multiplier 28 at t4095. Therefore, the last coefficient set in the jb0 sub-matrix in the JG0 partition matrix is read at t4095.

そして、Ｎ個の第２中間変数ｂ_ｉにおける先頭のブロックは、ＪＧ０の分割行列におけるｊｂ０のサブ行列における末尾の係数セットの乗累算が完了したクロックサイクル（ｔ４０９５）以後に、出力可能となる。従って、第２中間変数ｂ_ｉにおける先頭のブロックは、ｔ４０９５＋α（αは、所定の遅延時間）に出力される。 Then, the first block in the N second intermediate variables b _i can be output after the clock cycle (t4095) when the multiplication and accumulation of the last coefficient set in the jb0 submatrix in the JG0 division matrix is completed. . Therefore, the top block in the second intermediate variable b _i is output at t4095+α (α is a predetermined delay time).

Ｎ個の第１中間変数ｘ_ｉ´における先頭のデータセットは、ｔ０から（Ｐ_ｒ１×Ｐ_ｒ３＝２０４８）クロックサイクル遅延された後に、最後の行列乗算部２８における最後のレジスタ８４に格納される。従って、ＪＧＰｒ３－１の分割行列におけるｊｂＰｒ２－１のサブ行列における先頭の係数セットは、ｔ２０４７に読み出される。 The leading data set in the N first intermediate variables x _i ' is stored in the last register 84 in the last matrix multiplier 28 after being delayed (P _r1 ×P _r3 =2048) clock cycles from t0. . Therefore, the leading coefficient set in the sub-matrix of jbPr2-1 in the division matrix of JGPr3-1 is read at t2047.

また、Ｎ個の第１中間変数ｘ_ｉ´における末尾のデータセットは、ｔ４０９５から（Ｐ_ｒ１×Ｐ_ｒ３＝２０４８）クロックサイクル遅延された後に、最後の行列乗算部２８における最後のレジスタ８４に格納される。従って、ＪＧＰｒ３－１の分割行列におけるｊｂＰｒ２－１のサブ行列における末尾の係数セットは、ｔ６１４３に読み出される。 Also, the last data set in the N first intermediate variables x _i ' is stored in the last register 84 in the last matrix multiplier 28 after being delayed (P _r1 ×P _r3 =2048) clock cycles from t4095. be done. Therefore, the last coefficient set in the jbPr2-1 sub-matrix in the JGPr3-1 partition matrix is read at t6143.

そして、Ｎ個の第２中間変数ｂ_ｉにおける末尾のブロックは、ＪＧＰｒ３－１の分割行列におけるｊｂＰｒ２－１のサブ行列における末尾の係数セットの乗累算が完了したクロックサイクル（ｔ６１４３）以後に、出力可能となる。従って、第２中間変数ｂ_ｉにおける末尾のブロックは、ｔ６１４３＋αにおいて、出力される。 Then, after the clock cycle (t6143) when the multiplication and accumulation of the last coefficient set in the submatrix of jbPr2-1 in the division matrix of JGPr3-1 is completed, the last block in the N second intermediate variables b _{i is} Output is possible. Therefore, the last block in the second intermediate variable b _i is output at t6143+α.

このような構成の第５実施形態に係る計算装置１０は、Ｐ_ｒ３×Ｐ_ｒ２×Ｐ_ｒ１の並列度で行列乗算をすることができる。これにより、第５実施形態に係る計算装置１０によれば、行列乗算を高速に実行することができる。 The computing device 10 having such a configuration according to the fifth embodiment can perform matrix multiplication with a degree of parallelism of _Pr3 × _Pr2 × _Pr1 . As a result, the computing device 10 according to the fifth embodiment can perform matrix multiplication at high speed.

（第６実施形態）
第６実施形態に係る計算装置１０について説明する。 (Sixth embodiment)
A computing device 10 according to the sixth embodiment will be described.

図２０は、第６実施形態に係る時間発展部３０の構成を、複数の行列乗算部２８とともに示す図である。第６実施形態において、それぞれの行列乗算部２８は、１クロックサイクルにＰ_ｒ１個の第２中間変数ｂ_ｉを含む第２中間ストリームＢを出力する。そして、第６実施形態において、時間発展部３０は、１クロックサイクルにＰ_ｒ１個の第２中間変数ｂ_ｉを含む第２中間ストリームＢを受信する。 FIG. 20 is a diagram showing the configuration of the time evolution section 30 according to the sixth embodiment together with a plurality of matrix multiplication sections 28. As shown in FIG. In the sixth embodiment, each matrix multiplier 28 outputs a second intermediate stream B containing _Pr1 second intermediate variables b _i in one clock cycle. Then, in the sixth embodiment, the time evolution unit 30 receives the second intermediate stream B containing _Pr1 second intermediate variables b _i in one clock cycle.

そして、第６実施形態に係る時間発展部３０は、Ｐ_ｒ１個の第１変数メモリ４０と、Ｐ_ｒ１個の第２変数メモリ４２と、Ｐ_ｒ１個の第１加算部４４と、Ｐ_ｒ１個の関数演算部４６と、Ｐ_ｒ１個の第１乗算部４８と、１個の第１中間変数メモリ５０とを有する。 The time evolution unit 30 according to the sixth embodiment includes _Pr1 first variable memories 40, _Pr1 second variable memories 42, _Pr1 first addition units 44, and _Pr1 , _Pr1 first multipliers 48 and one first intermediate variable memory 50 .

Ｐ_ｒ１個の第１変数メモリ４０、Ｐ_ｒ１個の第２変数メモリ４２、Ｐ_ｒ１個の第１加算部４４、Ｐ_ｒ１個の関数演算部４６およびＰ_ｒ１個の第１乗算部４８は、１クロックサイクルに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉに一対一で対応する。 Pr ₁ first variable memories 40, Pr ₁ second variable memories 42, Pr ₁ first addition units 44, Pr ₁ function operation units 46, and _Pr 1 first multiplication units 48 are There is a one-to-one correspondence with _Pr1 second intermediate variables b _i included in one clock cycle.

そして、Ｐ_ｒ１個の第１加算部４４のそれぞれおよびＰ_ｒ１個の関数演算部４６のそれぞれは、１クロックサイクルに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉのうち対応する１つの第２中間変数ｂ_ｉについての演算処理を実行する。また、Ｐ_ｒ１個の第１変数メモリ４０のそれぞれおよびＰ_ｒ１個の第２変数メモリ４２のそれぞれは、１クロックサイクルに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉのうち対応する１つの第２中間変数ｂ_ｉを用いて算出された第１変数ｘ_ｉおよび第２変数ｙ_ｉを記憶する。また、Ｐ_ｒ１個の第１乗算部４８のそれぞれは、１クロックサイクルに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉのうち対応する１つの第２中間変数ｂ_ｉを用いて算出された第１変数ｘ_ｉに対する演算処理を実行する。 Then, each of the _Pr1 first addition units 44 and each of the _Pr1 function operation units 46 selects a corresponding one of the _Pr1 second intermediate variables b _i included in one clock cycle. Arithmetic processing is executed for the intermediate variable b _i . In addition, each of the _Pr1 first variable memories 40 and each of the _Pr1 second variable memories 42 correspond to one of the _Pr1 second intermediate variables b _i included in one clock cycle. A first variable x _i and a second variable _y _i calculated using the two intermediate variables b i are stored. In addition, each of the _Pr1 first multipliers 48 is calculated using a corresponding one of the _Pr1 second intermediate variables _b _i included in one clock cycle. Arithmetic processing is executed for one variable x _i .

このように、第６実施形態に係る時間発展部３０は、１クロックサイクルに含まれるＰ_ｒ１個の第２中間変数ｂ_ｉに対して、Ｎ個の第１変数ｘ_ｉおよびＮ個の第２変数ｙ_ｉの算出処理を、Ｐ_ｒ１の並列度で実行する。これにより、第６実施形態に係る計算装置１０は、Ｎ個の第１変数ｘ_ｉおよびＮ個の第２変数ｙ_ｉの算出処理を高速に実行することができる。 In this way, the time evolution unit 30 according to the sixth embodiment provides N first variables x i and N second variables _x _i for _Pr1 second intermediate variables b i included in one clock cycle. The calculation process of the variable _yi is executed with the degree of parallelism of _Pr1 . Thereby, the computing device 10 according to the sixth embodiment can execute the calculation process of the N first variables x _i and the N second variables y _i at high speed.

図２１は、第１中間変数ｘ_ｉ´および第２中間変数ｂ_ｉの出力タイミングを示す図である。より詳しくは、図２１の（Ａ）は、時間発展部３０から先頭の行列乗算部２８に第１中間変数ｘ_ｉ´が出力されるタイミングを示す。図２１の（Ｂ）は、末尾の行列乗算部２８から時間発展部３０へ第２中間変数ｂ_ｉが出力されるタイミングを示す。図２１の（Ｃ）は、時間発展部３０による第１中間変数ｘ_ｉ´の算出タイミングを示す。 FIG. 21 is a diagram showing output timings of the first intermediate variable x _i ' and the second intermediate variable b _i . More specifically, (A) of FIG. 21 shows the timing at which the first intermediate variable x _i ' is output from the time evolution section 30 to the matrix multiplication section 28 at the top. (B) of FIG. 21 shows the timing at which the second intermediate variable _bi is output from the matrix multiplication unit 28 at the end to the time evolution unit 30 . (C) of FIG. 21 shows the calculation timing of the first intermediate variable x _i ' by the time evolution unit 30 .

第１中間変数ｘ_ｉ´は、Ｐ_ｃ個の並列度で転送がされる。従って、図２１の（Ａ）に示すように、第１時刻における第１中間変数（ｘ´ｂ０～ｘ´ｂ４０９５（ｔ_１））の転送期間は、Ｎ_Ｓ（＝Ｎ／Ｐ_ｃ）クロックサイクルとなる。 The first intermediate variable x _i ' is transferred with P _c parallel degrees. Therefore, as shown in FIG. 21A, the transfer period of the first intermediate variables (x'b0 to x'b4095(t ₁ )) at the first time is N _S (=N/P _c ) clock cycles. becomes.

また、第２中間変数ｂ_ｉは、Ｐ_ｒ１個の並列度で転送がされる。従って、図２１の（Ｂ）に示すように、第１時刻における第２中間変数（ｂｂ０～ｂｂ２０４７（ｔ_１））の転送期間は、Ｎ／Ｐ_ｒ１クロックサイクルとなる。 Also, the second intermediate variable b _i is transferred with _Pr1 parallelism. Therefore, as shown in FIG. 21B, the transfer period of the second intermediate variables (bb0 to bb2047(t ₁ )) at the first time is N/P _r1 clock cycles.

また、時間発展部３０は、Ｐ_ｒ１個の並列度で処理が実行される。従って、図２１の（Ｃ）に示すように、第２時刻における第２中間変数（ｂｂ０～ｂｂ２０４７（ｔ_２））の演算期間は、Ｎ／Ｐ_ｒ１クロックサイクルとなる。 In addition, the time evolution unit 30 executes processing with a degree of parallelism of _Pr1 . Therefore, as shown in FIG. 21C, the calculation period of the second intermediate variables (bb0 to bb2047(t ₂ )) at the second time is N/P _r1 clock cycles.

ここで、Ｌ_ａは、第１時刻における最後の第１中間変数（ｘ´ｂ４０９５（ｔ_１））の出力が完了してから、第１時刻における最初の第２中間変数（ｂｂ０（ｔ_１））の出力が開始するまでの遅延時間である。Ｌ_ｂは、第１時刻における最初の第２中間変数（ｂｂ０（ｔ_１））の出力が開始してから、第２時刻における最初の第１中間変数（ｘ´ｂ０（ｔ_２））が算出されるまでの遅延時間である。Ｌ_ｃは、第２時刻における最初の第１中間変数（ｘ´ｂ０（ｔ_２））が算出されてから、第２時刻における最初の第１中間変数（ｘ´ｂ０（ｔ_２））が出力されるまでの遅延時間である。 Here _, L _a is the first second intermediate variable (bb0(t ₁ ) ) is the delay time until the output of L _b is calculated by the first intermediate variable (x′b0(t ₂ )) at the second time after the output of the first second intermediate variable (bb0(t ₁ )) at the first time starts. is the delay time until L _c is the output of the first intermediate variable (x′b0(t ₂ )) at the second time after the first intermediate variable (x′b0(t ₂ )) at the second time is calculated. is the delay time until

ここで、あるループ処理の開始から次のループ処理の開始までの期間（１ループ期間）は、Ｎ_Ｓ＋Ｌ_ａ＋Ｌ_ｂ＋Ｌ_ｃとなる。計算装置１０は、（Ｌ_ｂ＋Ｌ_ｃ）を、第１時刻における第２中間変数（ｂｂ０～ｂｂ２０４７（ｔ_１））の転送期間（Ｎ／Ｐ_ｒ１）より短くすることができる。従って、計算装置１０では、第１時刻における第２中間変数（ｂｂ０～ｂｂ２０４７（ｔ_１））の出力が完了する前に、第２時刻における第１中間変数（ｘ´ｂ０～ｘ´ｂ４０９５（ｔ_２））の出力が開始される。すなわち、計算装置１０は、ループ処理のオーバラップを実現している。計算装置１０によれば、このようなオーバラップの実現により、高速に処理を実行することができる。 Here, the period from the start of one loop process to the start of the next loop process (one loop period) is N _S +L _a +L _b +L _c . The computing device 10 can make (L _b +L _c ) shorter than the transfer period (N/P _r1 ) of the second intermediate variables (bb0 to bb2047(t ₁ )) at the first time. Therefore, in the computing device 10, the first intermediate variables (x'b0 to _x'b4095 (t ₂ )) is started to be output. That is, the computing device 10 realizes overlapping loop processing. According to the computing device 10, by realizing such overlap, it is possible to execute processing at high speed.

また、Ｎが大きい場合、Ｎ_Ｓ＞＞Ｌ_ａ＋Ｌ_ｂ＋Ｌ_ｃとなる。従って、Ｎが大きい場合、Ｌ_ａ＋Ｌ_ｂ＋Ｌが全体の実行時間に対して無視できる程度に短くなる。この場合、計算装置１０での行列乗算時間は、Ｎ_Ｓ（＝Ｎ／Ｐ_ｃ）クロックサイクルとみなすことができる。 Also, when N is large, _NS >>L _a +L _b +L _c . Therefore, when N is large, L _a +L _b +L becomes negligibly small relative to the overall execution time. In this case, the matrix multiplication time in computing device 10 can be considered as N _S (=N/P _c ) clock cycles.

例えば、シングルコアのプロセッサによるサイズがＮの行列演算時間は、（Ｎ×Ｎ）クロックサイクルである。シングルコアのプロセッサによる行列乗算時間に対する、計算装置１０での行列乗算時間は、（Ｎ／Ｐ_ｃ）／Ｎ×Ｎである。従って、計算装置１０は、シングルコアのプロセッサに対して、１／（Ｐ_ｃ×Ｎ）の時間で行列乗算を実行することができる。 For example, the matrix computation time of size N by a single-core processor is (N×N) clock cycles. The matrix multiplication time on computing device 10 relative to the matrix multiplication time on a single-core processor is (N/P _c )/N×N. Therefore, computing device 10 can perform matrix multiplication in 1/(P _c ×N) time for a single-core processor.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 While several embodiments of the invention have been described, these embodiments have been presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１０計算装置
２０演算部
２２入力部
２４出力部
２６設定部
２８行列乗算部
３０時間発展部
３２管理部
３６係数行列メモリ
３８行列乗算実行部
４０第１変数メモリ
４２第２変数メモリ
４４第１加算部
４６関数演算部
４８第１乗算部
５０第１中間変数メモリ
６０分割行列乗算部
６２バッファ部
６４分割行列メモリ
６６実行部
６８セレクタ
８０サブ行列乗算部
８２多重化部
８４レジスタ
８６バッファ内受信部
８８バッファ内送信部
９０乗累算部
９２乗算器
９４加算器
９６累加算器 10 calculation device 20 calculation unit 22 input unit 24 output unit 26 setting unit 28 matrix multiplication unit 30 time evolution unit 32 management unit 36 coefficient matrix memory 38 matrix multiplication execution unit 40 first variable memory 42 second variable memory 44 first addition unit 46 function operation unit 48 first multiplication unit 50 first intermediate variable memory 60 division matrix multiplication unit 62 buffer unit 64 division matrix memory 66 execution unit 68 selector 80 sub-matrix multiplication unit 82 multiplexing unit 84 register 86 buffer internal reception unit 88 buffer Inner transmitter 90 Multiply accumulator 92 Multiplier 94 Adder 96 Accumulator

Claims

Circuit information representing the configuration of a circuit described in a hardware description language,
causing the circuit to function as a computing device that solves an optimization problem using the Ising model;
The computing device
a first variable memory;
a second variable memory;
N (N is an integer of 2 or more) first intermediate variables at a first time and a coefficient matrix containing preset N rows×N columns of coefficients are multiplied by a matrix to obtain N at the first time a matrix multiplication unit that calculates second intermediate variables;
Calculate N first variables at a second time after one sampling period from the first time and N second variables at the second time based on the N second intermediate variables at the first time a time evolution unit that writes N first variables at the second time to the first variable memory and writes N second variables at the second time to the second variable memory;
a management unit that increments the time for each sampling period from the start time and causes the matrix multiplication unit and the time evolution unit to execute processing for each time;
an output unit that outputs N first variables at a preset end time;
with
The N first variables correspond to N spins in the Ising model ,
The N second variables correspond to the N spins ,
the N spins correspond to N points;
each of the N first variables represents the position of a point corresponding to a corresponding spin among the N points;
each of the N second variables represents the momentum of a point corresponding to the corresponding spin among the N points;
The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the A value obtained by multiplying the corresponding first variable by a preset coefficient,
The N second intermediate variables correspond to the N second variables. Circuit information.

Circuit information written to the reconfigurable semiconductor device to operate the reconfigurable semiconductor device,
causing the reconfigurable semiconductor device to function as a computing device that solves an optimization problem using an Ising model;
The computing device
a first variable memory;
a second variable memory;
N (N is an integer of 2 or more) first intermediate variables at a first time and a coefficient matrix containing preset N rows×N columns of coefficients are multiplied by a matrix to obtain N at the first time a matrix multiplication unit that calculates second intermediate variables;
Calculate N first variables at a second time after one sampling period from the first time and N second variables at the second time based on the N second intermediate variables at the first time a time evolution unit that writes N first variables at the second time to the first variable memory and writes N second variables at the second time to the second variable memory;
a management unit that increments the time for each sampling period from the start time and causes the matrix multiplication unit and the time evolution unit to execute processing for each time;
an output unit that outputs N first variables at a preset end time;
with
The N first variables correspond to N spins in the Ising model ,
The N second variables correspond to the N spins ,
the N spins correspond to N points;
each of the N first variables represents the position of a point corresponding to a corresponding spin among the N points;
each of the N second variables represents the momentum of a point corresponding to the corresponding spin among the N points;
The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the A value obtained by multiplying the corresponding first variable by a preset coefficient,
The N second intermediate variables correspond to the N second variables. Circuit information.

3. The circuit information according to claim 1, wherein the time evolution unit further calculates the N second intermediate variables at the second time based on the N first variables at the second time.

The time evolution unit calculates N second variables at the first time by adding the N second intermediate variables at the first time to the N second variables at the first time. 4. The circuit information according to claim 3, comprising an updating first addition unit.

The time evolution unit
a first FX calculation unit;
a first FX adder;
a first FY calculation unit;
a first FY adder;
further having
The first FX calculation unit calculates N second differential values by performing a first function calculation on each of the N first variables at the first time,
The first FX addition unit adds the N second differential values calculated by the first FX operation unit and the updated N second variables calculated by the first addition unit. By calculating N second update values,
The first FY calculation unit calculates N first differential values by performing a second function calculation on each of the N second update values calculated by the first FX addition unit. death,
The first FY addition unit adds the N first differential values calculated by the first FY calculation unit and the N first variables at the first time to obtain N 5. The circuit information according to claim 4, wherein a first update value is calculated.

The time evolution unit
a first FY calculation unit;
a first FY adder;
a first FX calculation unit;
a first FX adder;
further having
The first FY operation unit calculates N first differential values by performing a second function operation on each of the N updated second variables calculated by the first addition unit. death,
The first FY addition unit adds the N first differential values calculated by the first FY calculation unit and the N first variables at the first time to obtain N first 1 calculate the update value,
The first FX calculation unit calculates N second differential values by performing a first function calculation on each of the N first update values calculated by the first FY addition unit. death,
The first FX addition unit adds the N second differential values calculated by the first FX operation unit and the updated N second variables calculated by the first addition unit. 5. The circuit information according to claim 4, wherein N second update values are calculated.

The time evolution unit
(M-1) FX operation units from the second to the Mth (M is an integer of 2 or more);
second to M-th (M-1) FX addition units;
(M−1) FY calculation units from the second to the Mth,
(M−1) FY addition units from the second to the Mth;
further having
The m-th (m is an integer from 2 to M) FX calculation unit performs the first function calculation on each of the N first update values calculated by the (m-1)th FY addition unit By calculating N second differential values,
The m-th FX addition unit adds the N second differential values calculated by the m-th FX calculation unit and the N second update values calculated by the (m−1)th FX addition unit. to calculate new N second update values,
The m-th FY calculation unit calculates N first differential values by performing the second function calculation on each of the N second update values calculated by the m-th FX addition unit. death,
The m-th FY addition unit adds the N first differential values calculated by the m-th FY calculation unit and the N first update values calculated by the (m−1)th FY addition unit. 6. The circuit information according to claim 5, wherein new N first update values are calculated by:

The time evolution unit
(M−1) FY calculation units from the second to the Mth (M is an integer of 2 or more);
(M−1) FY addition units from the second to the Mth;
(M-1) FX operation units from the second to the Mth,
second to M-th (M-1) FX addition units;
further having
The m-th (m is an integer from 2 to M) FY operation unit performs the second function operation on each of the N second update values calculated by the (m-1)th FX addition unit. By calculating N first differential values,
The m-th FY addition unit adds the N first differential values calculated by the m-th FY calculation unit and the N first update values calculated by the (m−1)th FY addition unit. By calculating new N first update values,
The m-th FX calculation unit calculates N second differential values by performing the first function calculation on each of the N first update values calculated by the m-th FY addition unit. death,
The m-th FX addition unit adds the N second differential values calculated by the m-th FX calculation unit and the N second update values calculated by the (m−1)th FX addition unit. 7. The circuit information according to claim 6, wherein N new second update values are calculated by:

The first functional operation is
dt′×[{−D+p−Kx _i ² }x _i −c×h _i ×a] (101)
and
The second function operation is
dt′×D× _yi (102)
and
x _i is the i-th first variable among the N first variables at the first time, or the i-th first update value among the N first update values;
y _i is the i-th updated second variable among the N updated second variables calculated by the first adder or the i-th updated value among the N second updated values; is the second updated value of
dt' is a minute time set in advance,
D, c, K are preset constants,
h _i is a coefficient set for each i,
The circuit information according to any one of claims 5 to 8 , wherein p and a are values that increase with time according to a predetermined arithmetic expression.

The first functional operation is
dt′×{[(−D+p)(1+x _i ⁿ )−K _{x i} ⁿ⁺² ]x _i −c×h _i ×a} (103)
and
The second function operation is
dt′×D× _yi (104)
and
x _i is the i-th first variable among the N first variables at the first time, or the i-th first update value among the N first update values;
y _i is the i-th updated second variable among the N updated second variables calculated by the first adder or the i-th updated value among the N second updated values; is the second updated value of
dt' is a minute time set in advance,
D, c, K are preset coefficients,
h _i is a coefficient set for each i,
The circuit information according to any one of claims 5 to 8 , wherein p and a are values that increase with time according to a predetermined arithmetic expression.

The time evolution unit
a first multiplier that calculates the N first intermediate variables at the second time by multiplying each of the N first variables at the second time by a preset value;
a first intermediate variable memory for storing the N first intermediate variables at the second time calculated by the first multiplier;
11. The circuit information according to any one of claims 4 to 10 , further comprising:

The time evolution unit
a first intermediate variable memory that stores the N first variables at the second time as the N first intermediate variables at the second time;
a first multiplier that multiplies each of the N second intermediate variables at the first time output from the matrix multiplier by a preset value;
further having
The first adder adds the N second intermediate variables at the first time obtained by multiplying the N second variables at the first time by a value preset by the first multiplier. 11. The circuit information according to any one of claims 4 to 10 , further comprising: updating said N second variables at said first time by:

The circuit information according to any one of claims 1 to 12 , further comprising an input unit that acquires N first variables at the start time.

The time evolution unit outputs the N first intermediate variables at the first time to the matrix multiplication unit as a first intermediate stream containing a first number of first intermediate variables in one clock cycle;
The matrix multiplication unit outputs the N second intermediate variables at the first time to the time evolution unit as a second intermediate stream containing a second number of second intermediate variables in one clock cycle. 14. The circuit information according to any one of 1 to 13 .

The coefficient matrix is divided into P r3 partition matrices each including (N _/ P _r3 ) rows (P _r3 is a divisor of N)×N columns of coefficients;
the N second intermediate variables are divided into P _r3 blocks each containing (N/P _r3 ) second intermediate variables;
The P _r3 blocks are associated one-to-one with the P _r3 partitioning matrices;
The matrix multiplication unit has Pr ₃ partitioned matrix multiplication units that are associated one-to-one with the Pr ₃ partitioned matrices,
Each of the P _r3 split matrix multiplication units performs matrix multiplication of the first intermediate variable and the corresponding split matrix to obtain (N/P _r3 ) second intermediate variables included in the corresponding block 15. The circuit information according to claim 14 , wherein:

The Pr ₃ split matrix multiplication units are connected in series,
each of the _Pr3 split matrix multiplication units includes a buffer unit;
the buffer unit of the head division matrix multiplication unit acquires the first intermediate stream output from the time evolution unit, stores the acquired first intermediate stream for a certain period of time, and outputs the acquired first intermediate stream;
The buffer units of the dividing matrix multiplication units other than the first one acquire the first intermediate stream output from the immediately preceding stage dividing matrix multiplication unit, store the acquired first intermediate stream for a certain period of time, and output the acquired first intermediate stream. 15 circuit information.

17. The circuit information according to claim 15 or 16 , wherein the P _r3 partitioned matrix multiplication units and the time evolution unit are each included in independent ( _Pr3 + 1) chips.

further comprising ( _Pr3 + 1) communication links for transmitting the first intermediate stream and the second intermediate stream;
The output terminal of the k _- th chip (k is an integer of 1 or more and Pr3 or less) among the (P _r3 +1) chips is connected to the input terminal of the ( _k +1)-th chip. ) communication links via the k-th communication link,
18. Circuit information according to claim 17 , wherein the output terminal of the (P _r3 +1) th chip is connected to the input terminal of the first chip via the (P _r3 +1) th communication link.

Each of the P _r3 partitioned matrices is partitioned into P _r2 sub-matrices each containing P _r1 rows by N columns of coefficients, where P _r1 and P _r2 are divisors of N and P _r1 × P _r2 ×P _r3 =N, and
each of said P _r3 blocks is divided into P _r2 sub-blocks each containing P _r1 second intermediate variables;
The Pr2 sub-blocks included in the k-th block (k _is an integer equal to or greater than 1 and _Pr3 or less) are _associated one-to-one with the Pr2 sub-matrices included in the k-th partition matrix. ,
Each of _the _Pr3 partitioned matrix multipliers further includes Pr2 submatrix multipliers that are associated one-to-one with the _Pr2 submatrices included in the corresponding partitioned matrix,
Each of the _Pr2 sub-matrix multiplication units included in the k-th split matrix multiplication unit performs matrix multiplication of the N first intermediate variables and the corresponding sub-matrix to obtain the corresponding sub-block 17. The circuit information according to claim 16 , wherein P _r1 second intermediate variables included in are calculated.

The buffer unit includes a Pr _two- stage register functioning as a shift register,
The Pr _2- stage registers are associated one-to-one with the Pr ₂ sub-matrix multiplication units,
Each of the P _{r two-} stage registers stores P _c (P _c is a divisor of N) first intermediate variables in parallel for one clock cycle, and in the next clock cycle, the stored P _c 20. The circuit information according to claim 19 , wherein the first intermediate variables are transferred in parallel to the next-stage register.

each of the _Pr2 sub-matrix multiplication units includes Pr1 multiply-accumulate units that _are associated one-to-one with the _Pr1 rows included in the corresponding sub-matrix;
Each of the _Pr1 multiply-accumulate units:
acquiring N first intermediate variables from the beginning to the end of the first intermediate stream sequentially stored in corresponding registers of the Pr _two-stage registers for each clock cycle;
multiplying each obtained first variable by the coefficient of the column corresponding to the obtained first intermediate variable contained in the corresponding row in the coefficient matrix;
By cumulatively adding the multiplication result of the first intermediate variable and the coefficient from the first intermediate variable at the beginning of the first intermediate stream to the first intermediate variable at the end,
21. The circuit information according to claim 20 , wherein a second intermediate variable for a corresponding position included in a corresponding sub-block is calculated.

Each of the _Pr2 sub-matrix multiplication units outputs the _Pr1 second intermediate variables included in the corresponding sub-block in parallel in one clock cycle, and is different from the other sub-matrix multiplication units. 22. Circuit information according to any one of claims 19 to 21 , wherein said P _r1 second intermediate variables are output in a clock cycle.

Each of the _Pr3 split matrix multipliers multiplexes the _Pr1 second intermediate variables output from each of the _Pr2 sub-matrix multipliers, thereby performing _Pr1 in one clock cycle. 23. The circuit information according to claim 22 , further comprising a multiplexing unit for generating said second intermediate stream containing second intermediate variables.

Each of the P _r3 split matrix multiplication units calculates (N/P _r3 ) second intermediate variables included in the corresponding block based on the first intermediate stream stored in the buffer unit. further comprising
The execution unit
(N/P _r3 ) second intermediate variables included in the corresponding block as the second intermediate stream including P _r1 (P _r1 is a divisor of N) second intermediate variables in one clock cycle 24. Circuit information according to any one of claims 19 to 23 , outputting and outputting the second intermediate stream in a different clock cycle than other split matrix multipliers.

each of the _Pr3 split matrix multipliers includes a selector;
The selector of the leading split matrix multiplication unit,
selecting and outputting the second intermediate stream output by the execution unit of the top split matrix multiplication unit in the clock cycle in which the execution unit of the top split matrix multiplication unit outputs the second intermediate stream;
The selector of the k-th split matrix multiplication unit excluding the head,
selecting and outputting the second intermediate stream output by the execution unit of the k-th partitioned matrix multiplication unit in a clock cycle in which the execution unit of the k-th partitioned matrix multiplication unit outputs the second intermediate stream; death,
selecting and outputting the second intermediate stream output by the selector of the preceding divided matrix multiplication unit in a clock cycle in which the execution unit of the k-th divided matrix multiplication unit does not output the second intermediate stream. Circuit information according to claim 24 .

26. The time evolution unit according to claim 24 or 25 , for _Pr1 second intermediate variables included in one clock cycle, performs the calculation process of the N first variables with a degree of parallelism of _Pr1 . Circuit information listed.

A computing device that solves an optimization problem using an Ising model,
a first variable memory;
a second variable memory;
N (N is an integer of 2 or more) first intermediate variables at a first time and a coefficient matrix containing preset N rows×N columns of coefficients are multiplied by a matrix to obtain N at the first time a matrix multiplication unit that calculates second intermediate variables;
Calculate N first variables at a second time after one sampling period from the first time and N second variables at the second time based on the N second intermediate variables at the first time a time evolution unit that writes N first variables at the second time to the first variable memory and writes N second variables at the second time to the second variable memory;
a management unit that increments the time for each sampling period from the start time and causes the matrix multiplication unit and the time evolution unit to execute processing for each time;
an output unit that outputs N first variables at a preset end time;
with
The N first variables correspond to N spins in the Ising model,
The N second variables correspond to the N spins,
the N spins correspond to N points;
each of the N first variables represents the position of a point corresponding to a corresponding spin among the N points;
each of the N second variables represents the momentum of a point corresponding to the corresponding spin among the N points;
The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the A value obtained by multiplying the corresponding first variable by a preset coefficient,
The N second intermediate variables correspond to the N second variables
computing device.

A calculation method for solving an optimization problem using an Ising model by an information processing device,
The information processing device is
a first variable memory;
a second variable memory;
with
A matrix multiplication unit of the information processing device performs matrix multiplication of N first intermediate variables (N is an integer equal to or greater than 2) at a first time and a coefficient matrix containing preset N rows×N columns of coefficients. to calculate N second intermediate variables at the first time,
The time evolution unit of the information processing device generates the N first variables and the second calculating N second variables at the time, writing the N first variables at the second time to the first variable memory, and writing the N second variables at the second time to the second variable memory write in,
The management unit of the information processing device increases the time for each sampling period from the start time, and causes the matrix multiplication unit and the time evolution unit to perform processing for each time,
The output unit of the information processing device outputs N first variables at a preset end time,
The N first variables correspond to N spins in the Ising model,
The N second variables correspond to the N spins,
the N spins correspond to N points;
each of the N first variables represents the position of a point corresponding to a corresponding spin among the N points;
each of the N second variables represents the momentum of a point corresponding to the corresponding spin among the N points;
The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the A value obtained by multiplying the corresponding first variable by a preset coefficient,
The N second intermediate variables correspond to the N second variables
Method of calculation.

A program for causing an information processing device to function as a computing device that solves an optimization problem using an Ising model,
The information processing device is
a first variable memory;
a second variable memory;
with
the information processing device,
N (N is an integer of 2 or more) first intermediate variables at a first time and a coefficient matrix containing preset N rows×N columns of coefficients are multiplied by a matrix to obtain N at the first time a matrix multiplication unit that calculates second intermediate variables;
Calculate N first variables at a second time after one sampling period from the first time and N second variables at the second time based on the N second intermediate variables at the first time a time evolution unit that writes N first variables at the second time to the first variable memory and writes N second variables at the second time to the second variable memory;
a management unit that increments the time for each sampling period from the start time and causes the matrix multiplication unit and the time evolution unit to execute processing for each time;
an output unit that outputs N first variables at a preset end time;
to make it work,
The N first variables correspond to N spins in the Ising model,
The N second variables correspond to the N spins,
the N spins correspond to N points;
each of the N first variables represents the position of a point corresponding to a corresponding spin among the N points;
each of the N second variables represents the momentum of a point corresponding to the corresponding spin among the N points;
The N first intermediate variables correspond to the N first variables, and each of the N first intermediate variables is a corresponding first variable of the N first variables or the A value obtained by multiplying the corresponding first variable by a preset coefficient,
The N second intermediate variables correspond to the N second variables
program.