JP4963082B2

JP4963082B2 - Complex structure prediction apparatus, method, and program

Info

Publication number: JP4963082B2
Application number: JP2007132246A
Authority: JP
Inventors: 祐一郎蓬来; 弘毅塚本; 保野口; 一彦福井
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2007-05-18
Filing date: 2007-05-18
Publication date: 2012-06-27
Anticipated expiration: 2027-05-18
Also published as: JP2008287529A

Description

本発明は、タンパク質などの複数の立体構造がドッキングしたときの複合体構造を、複数の立体構造の相対的配置に応じた相互作用エネルギーを表すドッキングスコアを算出することにより予測する複合体構造予測装置に関する。 The present invention predicts a complex structure when a plurality of three-dimensional structures such as proteins are docked by calculating a docking score representing an interaction energy according to the relative arrangement of the plurality of three-dimensional structures. Relates to the device.

近年、生物学的な膨大な情報をコンピュータで処理するバイオインフォマティクスが急速に進歩している。この種の分野では、複数の構造体が作る複合体の構造（ドッキング構造）を予測することが、対象物の性質などの研究のために重要な意味を持つ。複合体を作る構造体としては典型的にはタンパク質が挙げられる。また、ＤＮＡ、糖類などについても複合体構造の予測が重要と考えられる。さらにはバイオテクノロジーの分野に限られず、その他の分野の研究、例えば、カーボンナノチューブの研究においても、複合体構造の予測は有用と考えられる。ここでは、主にタンパク質を取り上げて、従来の複合体構造の予測技術について説明する。 In recent years, bioinformatics for processing a huge amount of biological information with a computer has been rapidly progressing. In this type of field, predicting the structure of a complex (docking structure) formed by a plurality of structures is important for the study of the properties of the object. The structure that forms the complex typically includes a protein. It is also considered important to predict the complex structure for DNA, saccharides and the like. Furthermore, not only in the field of biotechnology but also in other fields such as carbon nanotubes, the prediction of the composite structure is considered useful. Here, a conventional technique for predicting a complex structure will be described by mainly taking up proteins.

複合体を形成するタンパク質の中には、単体のときとほぼ同じ形状を維持するものが多くあることが知られている。複合体構造予測では、２つのタンパク質の立体構造データ（原子の座標データ）が入力される。そして、２つの立体構造データの相対配置を少しずつ変えながら、各々の相対配置における両構造の適合性が算出され、そして、適合性の算出結果から複合体構造の候補が求められる。適合性計算では、２つの構造体の間の相互作用エネルギーの大きさを表すスコアが好適に算出される。このような適合性のスコアを以下ではドッキングスコアと呼ぶ。 It is known that many proteins that form a complex maintain almost the same shape as a single protein. In complex structure prediction, three-dimensional structure data (atomic coordinate data) of two proteins are input. Then, while changing the relative arrangement of the two three-dimensional structure data little by little, the compatibility of both structures in each relative arrangement is calculated, and a candidate for the complex structure is obtained from the calculation result of the compatibility. In the suitability calculation, a score representing the magnitude of interaction energy between the two structures is preferably calculated. Such a fitness score is hereinafter referred to as a docking score.

従来の複合体構造の予測技術としては、高速フーリエ変換（ＦＦＴ）による形状相補性探索が提案されている。この予測技術では、概略的には、（１）２つの構造データの各々がグリッドデータに変換され、（２）それから、２つのグリッドデータを重ね合わせたときの適合性が計算される。以下、簡単な図を使って、グリッドデータとそれを用いた適合性計算処理について説明する。 As a conventional technique for predicting a composite structure, a shape complementarity search by fast Fourier transform (FFT) has been proposed. In this prediction technique, roughly, (1) each of the two structural data is converted into grid data, and (2) the suitability when the two grid data are superimposed is calculated. Hereinafter, the grid data and the compatibility calculation process using the grid data will be described using a simple diagram.

図１は、グリッドデータの例を示している。グリッドデータは、元の構造データを離散化することにより生成される。構造データは、タンパク質を作る原子の３次元位置を表す座標データによって与えられる。これに対して、グリッドデータは、立方格子状に並んだ複数のグリッドの各々に与えられたデータである。座標データからグリッドデータへ変換することをグリッド化という。 FIG. 1 shows an example of grid data. Grid data is generated by discretizing the original structure data. The structural data is given by coordinate data representing the three-dimensional position of atoms that make up the protein. On the other hand, grid data is data given to each of a plurality of grids arranged in a cubic lattice. Converting from coordinate data to grid data is called gridding.

図１の例では、一方のタンパク質が標的物体としてのレセプタ（受容体）であり、もう一方のタンパク質がクエリ物体としてのリガンドである。この例では、レセプタのグリッドデータは、レセプタにより作られる各グリッド位置のエネルギー場のデータである。また、リガンドのグリッドデータは、各グリッドにおける粒子量のデータである。図では、物体表面のグリッドに「１」が付され、物体内部のグリッドに「９ｉ」が付され、物体外部のグリッドに「０」が付されている。 In the example of FIG. 1, one protein is a receptor (receptor) as a target object, and the other protein is a ligand as a query object. In this example, the grid data of the receptor is energy field data at each grid position created by the receptor. Further, the grid data of the ligand is data on the amount of particles in each grid. In the figure, “1” is attached to the grid on the object surface, “9i” is attached to the grid inside the object, and “0” is attached to the grid outside the object.

次に、図１の２つのグリッドデータを使った適合性計算について説明する。適合性計算では、２つのグリッドデータを重ね合わせたときの適合性のスコアが計算される。上述したように、ここでは適合性のスコアをドッキングスコアと呼ぶ。ドッキングスコアは、２つの物体の相互作用エネルギーの大きさを表す。ドッキングスコアＳは下記の式（１）で表される。

Next, the suitability calculation using the two grid data of FIG. 1 will be described. In the fitness calculation, a fitness score when two grid data are superimposed is calculated. As described above, the fitness score is referred to as a docking score here. The docking score represents the magnitude of interaction energy between two objects. The docking score S is represented by the following formula (1).

式（１）において、Ｒはレセプタのエネルギー場であり、Ｌはリガンドの粒子量である。（ｌ，ｍ，ｎ）はレセプタの配置である。（ｉ，ｊ，ｋ）は、リガンドの位置のずれ量である。ドッキングスコアＳは、レセプタを固定してリガンドをｘ，ｙ，ｚ方向に（ｉ，ｊ，ｋ）だけ移動させて重ね合わせたときの相互作用エネルギーである。 In formula (1), R is the energy field of the receptor, and L is the particle amount of the ligand. (L, m, n) is the arrangement of the receptors. (I, j, k) is the displacement amount of the position of the ligand. The docking score S is an interaction energy when the receptor is fixed and the ligand is moved in the x, y, and z directions by (i, j, k) and superimposed.

図２に示すように、複合体構造予測では、２つのグリッドデータを相対的に平行移動させながら、各位置でのドッキングスコアが計算される。この処理は、ＦＦＴによって高速に行われる。図２の例では、レセプタが固定され、リガンドが移動されている。そして、図２の右側には、ドッキングスコアＳが最高になる配置が概念的に示されている。 As shown in FIG. 2, in the complex structure prediction, the docking score at each position is calculated while relatively moving two grid data in parallel. This process is performed at high speed by FFT. In the example of FIG. 2, the receptor is fixed and the ligand is moved. Then, on the right side of FIG. 2, an arrangement in which the docking score S is highest is conceptually shown.

また、図３に示すように、２つの構造体は相対的に回転される。図の例では、リガンドが回転されている。図３は、９０度ずつリガンドを回転した状態を示しているが、実際にはより細かく回転されることはもちろんである。回転は、明示的に座標データを基に物体を回転させることで実現される。座標データを回転させながら、各々の角度にてグリッドデータが作られ、そして、グリッドデータを使って図２の平行移動による探索が行われる。 Moreover, as shown in FIG. 3, the two structures are rotated relatively. In the example shown, the ligand is rotated. Although FIG. 3 shows a state in which the ligand is rotated by 90 degrees, it is a matter of course that the ligand is actually rotated more finely. The rotation is realized by explicitly rotating the object based on the coordinate data. While rotating the coordinate data, grid data is created at each angle, and a search by the parallel movement of FIG. 2 is performed using the grid data.

図２の平行移動と図３の回転により、２つの構造体の相対配置を細かく変えながら、各々の相対配置にてドッキングスコアが計算される。そして、例えば、ドッキングスコアが上位の所定数の複合体構造が求められ、予測結果として出力される。 With the parallel movement of FIG. 2 and the rotation of FIG. 3, the docking score is calculated for each relative arrangement while finely changing the relative arrangement of the two structures. Then, for example, a predetermined number of complex structures with higher docking scores are obtained and output as prediction results.

なお、図１〜図３では、グリッドデータが２次元で表されているが、実際にはグリッドデータが３次元のデータであることはもちろんであり、平行移動および回転も３次元方向に行われる。 1 to 3, the grid data is represented in two dimensions. Actually, however, the grid data is three-dimensional data, and translation and rotation are also performed in the three-dimensional direction. .

次に、従来の予測技術による計算例についてさらに説明する。従来の計算例としては、非特許文献１に開示されたＺＤＯＣＫが知られている。この従来技術では、２つのグリッドデータをずらして重ね合わせたときの静電相互作用によるエネルギーの近似値が、下記の式（２）により計算される。

Next, a calculation example using the conventional prediction technique will be further described. As a conventional calculation example, ZDOCK disclosed in Non-Patent Document 1 is known. In this prior art, an approximate value of energy due to electrostatic interaction when two grid data are shifted and overlapped is calculated by the following equation (2).

式（２）において、Ｅ（ｌ，ｍ，ｎ）は、標的物体の静電場であり、ｑ（ｌ，ｍ，ｎ）は、クエリ物体の電荷である。Ｓ（ｉ，ｊ，ｋ）は、標的物体に対してクエリ物体を（ｉ，ｊ，ｋ）ずらしたときの静電相互作用エネルギーである。 In equation (2), E (l, m, n) is the electrostatic field of the target object and q (l, m, n) is the charge of the query object. S (i, j, k) is the electrostatic interaction energy when the query object is shifted (i, j, k) with respect to the target object.

ＺＤＯＣＫでは、静電相互作用スコアと形状相補性スコアの和が、適合性のスコアすなわちドッキングスコアとして用いられる。具体的には、式（２）のＳ（ｉ，ｊ，ｋ）と前出の式（１）のＳ（ｉ，ｊ，ｋ）の和がドッキングスコアとして用いられる。このスコアを用いて、複合体構造の探索が行われる。探索処理では、既に述べたように平行移動と回転移動が行われて、各位置でのスコアが求められる。すなわち、平行移動による探索では、ＦＦＴにより畳込み計算が行われて、移動範囲内のすべての（ｉ，ｊ，ｋ）にてドッキングスコアＳが計算される。また、回転移動による探索では、片方の物体が明示的に回転される。 In ZDOCK, the sum of the electrostatic interaction score and the shape complementarity score is used as a fitness score, that is, a docking score. Specifically, the sum of S (i, j, k) in equation (2) and S (i, j, k) in equation (1) is used as the docking score. Using this score, the complex structure is searched. In the search process, the parallel movement and the rotational movement are performed as described above, and the score at each position is obtained. That is, in the search by parallel movement, convolution calculation is performed by FFT, and the docking score S is calculated at all (i, j, k) within the movement range. In the search by rotational movement, one object is explicitly rotated.

図４は、式（２）に従ったスコア算出のための構成を示すブロック図である。ここでは、標的物体を静止させて、クエリ物体を動かすので、前者をＳｔａｔｉｃ、後者をＭｏｂｉｌｅと表している。スコア計算では、標的物体とクエリ物体の各々に対してＦＦＴ処理が施され、変換後の両データが掛け算され、それから逆フーリエ変換（ＩＦＦＴ）が行われて、スコアが求められる。図示のように、クエリ物体（Ｍｏｂｉｌｅ）が回転され、そして、回転する毎に点線で囲まれた処理（クエリ物体のフーリエ変換とその後の掛け算および逆フーリエ変換）が繰り返され、これにより、回転と平行移動による探索が行われる。 FIG. 4 is a block diagram showing a configuration for calculating a score according to equation (2). Here, since the target object is stopped and the query object is moved, the former is expressed as Static and the latter is expressed as Mobile. In the score calculation, the FFT processing is performed on each of the target object and the query object, both the converted data are multiplied, and then the inverse Fourier transform (IFFT) is performed to obtain the score. As shown in the figure, the query object (Mobile) is rotated, and the process surrounded by the dotted line (Fourier transform of the query object and subsequent multiplication and inverse Fourier transform) is repeated each time it rotates. Search by translation is performed.

複合体構造予測は基本的には上記の原理に従って行われる。ただし、低次元の複素数、実数の掛け算処理では、非常に単純なモデルしか表現できず、高い予測精度を得るのは容易でない。 The composite structure prediction is basically performed according to the above principle. However, in a low-dimensional complex number and real number multiplication process, only a very simple model can be expressed, and it is not easy to obtain high prediction accuracy.

そこで、より高次元の処理を行うことが求められる。このような要求に応える予測技術としては、ペアポテンシャルを用いる手法が提案されている。この予測技術では、以下に説明するように、ペアポテンシャル行列が導入されて、ベクトルおよび行列が処理されて、ドッキングスコアが算出される。 Therefore, it is required to perform higher-dimensional processing. As a prediction technique that meets such requirements, a method using a pair potential has been proposed. In this prediction technique, as described below, a pair potential matrix is introduced, vectors and matrices are processed, and a docking score is calculated.

図５は、ペアポテンシャル行列の例を示している。このペアポテンシャル行列は非特許文献２に開示されている。ペアポテンシャル行列では、構造体を構成し得る要素が複数のタイプに分類される。この例では、タンパク質を作る原子が１８のタイプに分類されているとする。これら１８のタイプから作られる任意のペアの相互作用エネルギーが、図示のようにマトリクスのかたちに配置される。図の例では、相互作用エネルギー（ポテンシャル値）は、一つのタイプの原子の回りにある水分子を取り除いて別の原子に置き換えたときのエネルギーの変化量で表される。例えば、タイプ１の原子の回りの水分子をタイプ２の原子に置き換えたときのエネルギー変化が、タイプ１とタイプ２のペアのポテンシャル値になる。 FIG. 5 shows an example of a pair potential matrix. This pair potential matrix is disclosed in Non-Patent Document 2. In the pair potential matrix, elements that can constitute the structure are classified into a plurality of types. In this example, it is assumed that the atoms forming the protein are classified into 18 types. Any pair of interaction energies made from these 18 types is placed in the form of a matrix as shown. In the example in the figure, the interaction energy (potential value) is represented by the amount of energy change when a water molecule around one type of atom is removed and replaced with another atom. For example, the energy change when the water molecules around the type 1 atom are replaced with the type 2 atom becomes the potential value of the type 1 and type 2 pair.

ペアポテンシャル行列を使うときは、要素間の距離も考慮される。ここでは、図示のように、原子ペアが所定のカットオフ距離（例えば６オングストローム）以下の距離にある場合に、それら原子ペアがペアポテンシャル魚列の相互作用エネルギーを持つとする。したがって、例えば、タイプ２の原子とタイプ３の原子がカットオフ距離以下の距離にある場合、相互作用エネルギーが−０．８５０であることがペアポテンシャル行列から求められる。なお、原子ペアの距離に応じた相互作用エネルギーの変化が更に考慮されてもよい。 When using a pair potential matrix, the distance between elements is also taken into account. Here, as shown in the figure, it is assumed that when atomic pairs are at a distance equal to or shorter than a predetermined cut-off distance (for example, 6 angstroms), these atomic pairs have interaction energy of a pair potential fish line. Therefore, for example, when the type 2 atom and the type 3 atom are at a distance equal to or smaller than the cutoff distance, it is determined from the pair potential matrix that the interaction energy is −0.850. In addition, the change of the interaction energy according to the distance of an atom pair may be further considered.

また、ペアポテンシャルの別の例を示すと、この例では、ポテンシャルの値が下記の二次形式スコアにより表現される。この場合、各原子が例えば２０〜２００のタイプに分類される。そして、ペアポテンシャルが、例えば、Ｌｅｎｎａｒｄ−Ｊｏｎｅｓポテンシャルとクーロンポテンシャルの和で表される。タイプｉの原子とタイプｊの原子が距離ｄだけ離れて存在したとすると、ポテンシャル値は下の式（３）で表される。

In addition, in another example of the pair potential, in this example, the value of the potential is expressed by the following secondary form score. In this case, each atom is classified into, for example, 20 to 200 types. The pair potential is represented by, for example, the sum of the Lennard-Jones potential and the Coulomb potential. If an atom of type i and an atom of type j exist at a distance d, the potential value is expressed by the following equation (3).

式（３）において、第１項および第２項がＬｅｎｎａｒｄ−Ｊｏｎｅｓポテンシャルに対応し、第３項がクーロンポテンシャルに対応する。ここで、第１項の１番目の行列における“１／ｄ^１２”は、同行列のｉ番目の要素である。また、第１項の３番目の行列における“１”は、同行列のｊ番目の要素である。同様に、第２項の１番目の行列における“１／ｄ^６”は、同行列のｉ番目の要素であり、３番目の行列における“１”は、同行列のｊ番目の要素である。 In Expression (3), the first and second terms correspond to the Lennard-Jones potential, and the third term corresponds to the Coulomb potential. Here, “1 / d ¹² ” in the first matrix of the first term is the i-th element of the same matrix. Also, “1” in the third matrix of the first term is the jth element of the same matrix. Similarly, “1 / d ⁶ ” in the first matrix of the second term is the i-th element of the same matrix, and “1” in the third matrix is the j-th element of the same matrix.

次に、ペアポテンシャルを使ったグリッド上でのスコア計算について説明する。ここでは、図１〜図３の例と同様に、レセプタとリガンドをドッキングしたときのスコアが算出される。レセプタのグリッドデータは、下記のベクトルで表される。この式（４）は、レセプタのグリッド化を表している。

Next, score calculation on the grid using the pair potential will be described. Here, as in the example of FIGS. 1 to 3, the score when the receptor and the ligand are docked is calculated. The grid data of the receptor is represented by the following vector. This equation (4) represents the grid formation of the receptor.

また、リガンドのグリッドデータは、下記のベクトルで表される。この式（５）は、リガンドのグリッド化を表している。

The grid data of the ligand is expressed by the following vector. This equation (5) represents grid formation of the ligand.

この場合、レセプタグリッド（ｉ，ｊ，ｋ）とリガンドグリッド（ｌ，ｍ，ｎ）を重ね合わせたときの部分的なエネルギーが下の式（６）で表される。ここで、行列Ａは、ペアポテンシャル行列である。

In this case, partial energy when the receptor grid (i, j, k) and the ligand grid (l, m, n) are superposed is expressed by the following equation (6). Here, the matrix A is a pair potential matrix.

そして、すべてのグリッドの寄与を足し合わせた値が、下の式（７）に示すように、レセプタとリガンドのドッキングスコアＳになる。

A value obtained by adding the contributions of all grids is a docking score S of the receptor and the ligand as shown in the following formula (7).

式（７）において、ドッキングスコアＳは、レセプタを（ｌ，ｍ，ｎ）に配置し、リガンドをレセプタから（ｉ，ｊ，ｋ）ずらして配置したときのスコアである。 In Expression (7), the docking score S is a score when the receptor is arranged at (1, m, n) and the ligand is arranged by being shifted (i, j, k) from the receptor.

次に、上記のスコア算出に好適に用いられるＤｉｍａｅｔａｌ．の手法（非特許文献３）について説明する。この手法では、ドッキングスコアの式が、以下のようにして変形される。 Next, Dima et al. Suitably used for the above score calculation. Will be described (Non-Patent Document 3). In this method, the docking score formula is modified as follows.

まず、ペアポテンシャル行列Ａが対象行列であることに着目して、ペアポテンシャル行列Ａが、下の式（８）に示すようにして分解される。ここでは、固有値分解が行われる。

First, paying attention to the fact that the pair potential matrix A is a target matrix, the pair potential matrix A is decomposed as shown in the following equation (8). Here, eigenvalue decomposition is performed.

Ｕは直行行列（固有ベクトル）であり、Ｄは対角行列（固有値）である。さらに、対角行列Ｄにおける固有値の小さい次元を圧縮することにより、式（９）に示すようにペアポテンシャル行列Ａが近似される。具体的には、ｒ＋１番目の大きさ以降の固有値が０にされる。

U is an orthogonal matrix (eigenvector), and D is a diagonal matrix (eigenvalue). Further, by compressing the dimension of the eigenvalue in the diagonal matrix D, the pair potential matrix A is approximated as shown in Equation (9). Specifically, eigenvalues after the (r + 1) th magnitude are set to zero.

行列上では、第１行、第１列から、第ｒ行、第ｒ列までの固有値が残り、その他の部分（第ｒ＋１行、第ｒ＋１列以降の固有値を含む）が０になってよい。上記のｒの値は、近似処理で残される対角行列の固有値の個数であり、近似固有値数または近似次元数と呼ぶことができる。 On the matrix, the eigenvalues from the first row and the first column to the r-th row and the r-th column remain, and other portions (including eigenvalues after the (r + 1) th row and the (r + 1) th column) may be zero. The value of r is the number of eigenvalues of the diagonal matrix left in the approximation process, and can be called the approximate eigenvalue number or approximate dimension number.

近似されたペアポテンシャル行列は、左行列ＵＤｒと右行列Ｕ^Ｔに分けられる。レセプタのグリッド化ベクトルＲに左行列ＵＤｒが掛けられ、また、右行列Ｕ^Ｔにリガンドのグリッド化ベクトルＲが掛けられる（式（１０））。前者を第１グリッドデータＲ’とし、後者を第２グリッドデータＬ’とする。

Approximated pair potential matrices are divided into the left matrix UDr and right matrix U ^T. Left matrix UDr is multiplied receptor gridding vector R, also gridding vector R ligand is applied to the right matrix U ^T (Equation (10)). The former is first grid data R ′ and the latter is second grid data L ′.

第１グリッドデータＲ’（レセプタ側）と第２グリッドデータＬ’（リガンド側）を用いると、ドッキングスコアが下の式（１１）のように表される。

When the first grid data R ′ (receptor side) and the second grid data L ′ (ligand side) are used, the docking score is expressed as the following equation (11).

ここで、ドッキングスコアは、元々、レセプタのグリッド化ベクトルデータと、ペアポテンシャル行列と、リガンドのグリッド化ベクトルデータとの積で表される。Ｄｉｍａｅｔａｌ．の手法では、ドッキングスコアの式におけるペアポテンシャル行列が、固有値分解および近似され、左行列、右行列に分けられる。さらにレセプタと左行列の積が第１グリッドデータＲ’に置き換えられ、右行列とリガンドの積が第２グリッドデータＬ’へ置き換えられると、上記の式が得られる。このようにして、ドッキングスコアが第１グリッドデータＲ’と第２グリッドデータＬ’の積により表される。そして、この変形された式において、第１グリッドデータＲ’のｒ＋１次元目以降の要素は、近似処理によって０になっており、したがって、ｒ回の実数畳込みの和によってドッキングスコアが求められる。ｒは、式（９）に関連して説明したように近似固有値数または近似次元数であり、具体的にはペアポテンシャル行列を固有値分解して得られる対角行列を近似するときに残される固有値の個数に相当する。 Here, the docking score is originally represented by the product of the receptor gridded vector data, the pair potential matrix, and the ligand gridded vector data. Dima et al. In this method, the pair potential matrix in the docking score formula is subjected to eigenvalue decomposition and approximation, and divided into a left matrix and a right matrix. Further, when the product of the receptor and the left matrix is replaced with the first grid data R ′, and the product of the right matrix and the ligand is replaced with the second grid data L ′, the above equation is obtained. In this way, the docking score is represented by the product of the first grid data R ′ and the second grid data L ′. In this modified expression, the elements after the (r + 1) -th dimension of the first grid data R ′ are 0 by the approximation process, and therefore the docking score is obtained by the sum of r number of real convolutions. r is the number of approximate eigenvalues or approximate dimensions as described in connection with equation (9), and specifically, the eigenvalues that are left when approximating a diagonal matrix obtained by eigenvalue decomposition of a pair potential matrix. It corresponds to the number of.

図６は、Ｄｉｍａｅｔａｌ．の手法によるドッキングスコアの計算のための構成を示している。ここでは、前述の例と同様に、レセプタ側の第１グリッドデータＲ’が固定され、リガンド側の第２グリッドデータＬ’が移動される。そこで、図中では前者をＳｔａｔｉｃ、後者をＭｏｂｉｌｅと表している。スコア計算には、ｒ回の順方向ＦＦＴと１回の逆方向ＦＦＴが行われる。より詳細には、Ｓｔａｔｉｃに対してＦＦＴが行われ、ＦＦＴ結果が保存される。また、Ｍｏｂｉｌｅに対してＦＦＴがｒ回行われる。Ｍｏｂｉｌｅに対する各々のＦＦＴ結果が、ＳｔａｔｉｃのＦＦＴ結果と掛け算される。こうして得られるｒ回の掛け算結果が合計され、さらに、逆方向ＦＦＴが行われて、ドッキングスコアが得られる。 FIG. 6 shows a diagram of Dima et al. The structure for calculation of the docking score by the method of is shown. Here, similarly to the above-described example, the first grid data R ′ on the receptor side is fixed, and the second grid data L ′ on the ligand side is moved. Therefore, in the figure, the former is represented as Static and the latter is represented as Mobile. In the score calculation, r forward FFTs and 1 backward FFT are performed. More specifically, the FFT is performed on the static and the FFT result is stored. Also, FFT is performed r times on the Mobile. Each FFT result for the Mobile is multiplied with the Static FFT result. The r multiplication results obtained in this way are summed, and a reverse FFT is performed to obtain a docking score.

図６のスコア計算では、保存領域および作業領域としてのメモリ領域が必要である。最小所要メモリは下記のように考えられる。３次元ＦＦＴにおける一次元方向のサイズを、Ｎとする。Ｎは、グリッド数である。つまり、スコア算出部３３では、Ｎ×Ｎ×Ｎのグリッドの空間でドッキング構造を探索する。この場合、Ｓｔａｔｉｃ部分のＦＦＴ結果を保存するために、ｒ×Ｎ×Ｎ×Ｎの領域が必要である。また、ＦＦＴの作業領域として、Ｎ×Ｎ×Ｎの領域が必要である。また、和の保存領域として、Ｎ×Ｎ×Ｎの領域が必要である。これらを合計すると、（ｒ＋２）×Ｎ×Ｎ×Ｎの領域が必要である。 In the score calculation of FIG. 6, a storage area and a memory area as a work area are required. The minimum required memory is considered as follows. Let N be the size in the one-dimensional direction in the three-dimensional FFT. N is the number of grids. That is, the score calculation unit 33 searches for a docking structure in an N × N × N grid space. In this case, an area of r × N × N × N is required to store the FFT result of the static portion. Further, an N × N × N area is required as an FFT work area. Further, an area of N × N × N is required as the sum storage area. When these are added together, an area of (r + 2) × N × N × N is required.

ここで、Ｎ（グリッド数）は実用上は例えば５０〜３５０であるとする。Ｎ＝３５０とすると、Ｎ×Ｎ×Ｎに相当するメモリ領域はおよそ２００ＭＢである。この場合、（ｒ＋２）×Ｎ×Ｎ×Ｎ＝（ｒ＋２）×２００ＭＢ程度のメモリ領域が少なくとも必要になる。
Chen R, Li and L, WengZ, "ZDOCK: An Initial-stage Protein-Docking Algorithm", Proteins:Structure, Function, and Bioinformatics, 52:80-87 (2003) Chao Zhang, GeorgeVasmatzis, James L. Cornette and Charles DeLisi, "Determination of AtomicDesolvation Energies From the Structures of Crystallized Proteins", J.Mol. Biol. (1997) 267, pp707-726 Dima Kozakov, RyanBrenke, Stephen R. Comeau, and Sandor Vajda, "PIPER: An FFT-Based ProteinDocking Program with Pairwise Potentials", PROTEINS: Structure, Function,and Bioinformatics, 65:392-406 (2006) Here, N (the number of grids) is practically 50 to 350, for example. If N = 350, the memory area corresponding to N × N × N is approximately 200 MB. In this case, at least a memory area of about (r + 2) × N × N × N = (r + 2) × 200 MB is required.
Chen R, Li and L, WengZ, "ZDOCK: An Initial-stage Protein-Docking Algorithm", Proteins: Structure, Function, and Bioinformatics, 52: 80-87 (2003) Chao Zhang, GeorgeVasmatzis, James L. Cornette and Charles DeLisi, "Determination of AtomicDesolvation Energies From the Structures of Crystallized Proteins", J. Mol. Biol. (1997) 267, pp707-726 Dima Kozakov, RyanBrenke, Stephen R. Comeau, and Sandor Vajda, "PIPER: An FFT-Based ProteinDocking Program with Pairwise Potentials", PROTEINS: Structure, Function, and Bioinformatics, 65: 392-406 (2006)

しかしながら、従来のペアポテンシャル行列を用いる複合体構造予測技術においては、Ｄｉｍａｅｔａｌ．の手法に見られるようにペアポテンシャル行列の近似処理を行っているものの、依然として計算量が多い。そして、計算量が膨大にならないようにするために、使用可能なペアポテンシャル行列のサイズが制限されてしまい、このことが、スコア精度を下げて予測精度を低下させる要因になり、また、予測技術の適用範囲を狭める要因にもなる。 However, in the complex structure prediction technique using the conventional pair potential matrix, Dima et al. Although the pair potential matrix approximation processing is performed as seen in the above method, the calculation amount is still large. And, in order to prevent the calculation amount from becoming enormous, the size of the pair potential matrix that can be used is limited, which is a factor that lowers the score accuracy and lowers the prediction accuracy. It becomes a factor to narrow the scope of application.

上記の予測技術の適用範囲についてさらに説明する。例えば、タンパク質以外の物質として、糖の複合体構造を予測する場合を検討する。この場合、糖を作る要素の種類が多いために、ペアポテンシャル行列が大きくなってしまい、計算量が膨大になる。そのために糖の複合体構造をペアポテンシャル行列を使って予測するのは現状では容易でない。実際のところ、ペアポテンシャル行列を使って複合体構造を予測できるのは、タンパク質の場合に限られており、タンパク質以外の物質の複合体構造予測にはペアポテンシャル行列を使えていないのが現状といえる。そこで、タンパク質以外の物質の複合体構造も予測可能にするために、より多くのタイプを持つペアポテンシャル行列を使えるようにすることが望まれ、そのためにも計算量の削減が望まれる。 The application range of the prediction technique will be further described. For example, consider the case where a sugar complex structure is predicted as a substance other than protein. In this case, since there are many types of elements that make sugar, the pair potential matrix becomes large, and the amount of calculation becomes enormous. Therefore, it is not easy at present to predict the sugar complex structure using the pair potential matrix. Actually, it is only in the case of proteins that the complex structure can be predicted using the pair potential matrix, and the current situation is that the pair potential matrix is not used to predict the complex structure of substances other than proteins. I can say that. Therefore, in order to make it possible to predict the complex structure of substances other than proteins, it is desirable to be able to use a pair potential matrix having a larger number of types. For this reason, it is desirable to reduce the amount of calculation.

また、従来技術では、スコア計算に多くのメモリが必要とされるという難点がある。必要とされるメモリサイズは、図６を使って説明したように、（ｒ＋２）×Ｎ×Ｎ×Ｎで表され、ｒに比例して大きくなる。ｒは、前述した近似固有値数または近似次元数であり、すなわち、ペアポテンシャル行列を分解して得られる対角行列の近似にて残される固有値の個数に相当する。ｒを大きくすると、スコア精度（予測精度）は上がるが、メモリサイズが膨大になる。逆にｒを小さくすると、メモリサイズは抑えられるが、スコア精度が低下する。従来の計算例では、メモリサイズが制限された省メモリ環境下でスコア計算を実行するために、精度を犠牲にしても、ｒが小さく設定されている。例えば、前出の１８次元のペアポテンシャル行列を適用するときに、ｒが２または４に設定されて、１８次元の行列が２次元または４次元の行列へと近似される。このような問題を解決し、メモリサイズを増大させずにｒ（近似固有値数または近似次元数）を大きくして、高い精度を得ることが望まれる。 Further, the conventional technique has a drawback that a large amount of memory is required for score calculation. As described with reference to FIG. 6, the required memory size is represented by (r + 2) × N × N × N, and increases in proportion to r. r is the number of approximate eigenvalues or approximate dimensions described above, that is, corresponds to the number of eigenvalues remaining in approximation of a diagonal matrix obtained by decomposing a pair potential matrix. When r is increased, the score accuracy (prediction accuracy) increases, but the memory size becomes enormous. Conversely, if r is reduced, the memory size is suppressed, but the score accuracy is reduced. In the conventional calculation example, since the score calculation is executed in a memory saving environment in which the memory size is limited, r is set small even at the expense of accuracy. For example, when applying the 18-dimensional pair potential matrix described above, r is set to 2 or 4, and the 18-dimensional matrix is approximated to a 2-dimensional or 4-dimensional matrix. It is desirable to solve such a problem and increase r (number of approximate eigenvalues or approximate number of dimensions) without increasing the memory size to obtain high accuracy.

本発明は、上記背景の下でなされたものであり、本発明の一の目的は、少ない計算量でドッキングスコアを計算でき、より多くのタイプを持つペアポテンシャルを使用可能にできる複合体構造予測技術を提供することにある。また、本発明の一の目的は、より少ないメモリサイズでもってより高い精度でドッキングスコアを計算可能にすることにある。 The present invention has been made under the above-mentioned background, and one object of the present invention is to predict a complex structure that can calculate a docking score with a small amount of calculation and can use a pair potential having more types. To provide technology. Another object of the present invention is to make it possible to calculate a docking score with higher accuracy with a smaller memory size.

本発明の複合体構造予測装置は、複数の立体構造がドッキングしたときの複合体構造を、前記複数の立体構造の相対的配置に応じた相互作用エネルギーを表すドッキングスコアを算出することにより予測する装置であって、ドッキングされるべき前記複数の立体構造である第１入力構造および第２入力構造を入力する構造入力部と、立体構造を構成し得る要素を複数のタイプに分類し、前記複数のタイプから作られる任意のペアの相互作用エネルギーを配列したペアポテンシャル行列を記憶するペアポテンシャル行列記憶部と、前記第１入力構造および前記第２入力構造に基づいて前記ペアポテンシャル行列を圧縮し、前記第１入力構造に出現する任意の要素と前記第２入力構造に出現する任意の要素で作られる任意のペアの相互作用エネルギーを配列した圧縮ペアポテンシャル行列を生成するペアポテンシャル行列変形部と、前記第１入力構造、第２入力構造および前記圧縮ペアポテンシャル行列を用いて、前記第１入力構造および前記第２入力構造を変換して、グリッド単位のデータの集まりからなるスコア算出のための第１構造グリッドデータおよび第２構造グリッドデータを生成するグリッド処理部と、前記第１構造グリッドデータおよび前記第２構造グリッドデータから、前記第１入力構造および前記第２入力構造のドッキングスコアを算出するスコア算出部と、を備えている。 The complex structure prediction apparatus of the present invention predicts a complex structure when a plurality of three-dimensional structures are docked by calculating a docking score representing an interaction energy according to the relative arrangement of the plurality of three-dimensional structures. A device that classifies a plurality of types of structure input units that input the first input structure and the second input structure, which are the plurality of three-dimensional structures to be docked, and elements that can form the three-dimensional structure. A pair potential matrix storage unit for storing a pair potential matrix in which interaction energy of an arbitrary pair made from the type is arranged, and compressing the pair potential matrix based on the first input structure and the second input structure, Any pair of interaction energies made of any element appearing in the first input structure and any element appearing in the second input structure The first input structure and the second input structure are obtained by using a pair potential matrix deforming unit that generates a compressed pair potential matrix in which-is arranged, the first input structure, the second input structure, and the compressed pair potential matrix. From the grid processing unit that converts and generates first structural grid data and second structural grid data for score calculation including a collection of data in units of grids, and from the first structural grid data and the second structural grid data And a score calculation unit for calculating a docking score of the first input structure and the second input structure.

上記のように、本発明によれば、ペアポテンシャル行列が圧縮され、圧縮ペアポテンシャル行列を用いてドッキングスコアが計算される。したがって、ドッキングスコア計算処理における計算量を少なくできる。計算量を少なくできるので、より多くのタイプを持ったペアポテンシャルの処理も可能になる。 As described above, according to the present invention, the pair potential matrix is compressed, and the docking score is calculated using the compressed pair potential matrix. Therefore, the calculation amount in the docking score calculation process can be reduced. Since the amount of calculation can be reduced, it is possible to process pair potentials with more types.

また、前記ペアポテンシャル行列変形部は、さらに、前記圧縮ペアポテンシャル行列を第１直行行列、対角行列および第２直行行列に分解する特異値分解を行い、前記対角行列の複数の固有値の一部を０にすることで近似固有値数の固有値を残す近似処理を行ってよい。 Further, the pair potential matrix transformation unit further performs singular value decomposition for decomposing the compressed pair potential matrix into a first orthogonal matrix, a diagonal matrix, and a second orthogonal matrix, and outputs a plurality of eigenvalues of the diagonal matrix. An approximation process may be performed in which the eigenvalue of the approximate eigenvalue number is left by setting the part to 0.

また、前記グリッド化処理部は、前記第１直行行列と近似された前記対角行列と前記第２直行行列とを前記圧縮ペアポテンシャル行列として用いて、前記第１入力構造、前記第１の直行行列および近似された前記対角行列から前記第１構造グリッドデータを生成し、前記第２入力構造および前記第２直行行列から前記第２構造グリッドデータを生成してよい。 In addition, the grid processing unit uses the diagonal matrix approximated to the first orthogonal matrix and the second orthogonal matrix as the compressed pair potential matrix, and uses the first input structure and the first orthogonal matrix. The first structure grid data may be generated from a matrix and the approximated diagonal matrix, and the second structure grid data may be generated from the second input structure and the second orthogonal matrix.

また、前記スコア算出部は、前記第１構造グリッドデータと前記第２構造グリッドデータの組合せに対し、前記近似処理で残された固有値の個数である前記近似固有値数に対応する回数、ＦＦＴ処理および掛け算処理を施し、それらの処理結果の和を求めてよい。 In addition, the score calculation unit, for the combination of the first structural grid data and the second structural grid data, the number of times corresponding to the approximate eigenvalue number that is the number of eigenvalues left in the approximate process, FFT processing, Multiplication processing may be performed and the sum of the processing results may be obtained.

また、前記スコア算出部は、前記第１構造グリッドデータと前記第２構造グリッドデータの組合せに対してＦＦＴ処理を行うための第１メモリ領域と、前記処理結果の和を求めるための第２メモリ領域とを使用してスコア算出を行ってよい。 The score calculation unit includes a first memory area for performing FFT processing on a combination of the first structural grid data and the second structural grid data, and a second memory for obtaining a sum of the processing results. The area may be used to calculate the score.

また、前記スコア算出部は、前記第１構造グリッドデータと前記第２構造グリッドデータを相対的に移動したときの各々の相対配置におけるドッキングスコアを算出してよい。 The score calculation unit may calculate a docking score in each relative arrangement when the first structural grid data and the second structural grid data are relatively moved.

また、複合体構造予測装置は、前記第１入力構造および前記第２入力構造の少なくとも一方をグリッド化前に回転する構造回転部を有してよい。 Moreover, the composite structure prediction apparatus may include a structure rotating unit that rotates at least one of the first input structure and the second input structure before being gridded.

また、複合体構造予測装置は、前記スコア算出部により算出された各々の相対配置におけるドッキングスコアを比較して、前記複合体構造の候補を選定する候補選定部を有してよい。 In addition, the complex structure prediction apparatus may include a candidate selection unit that compares the docking scores in the respective relative arrangements calculated by the score calculation unit and selects the complex structure candidates.

本発明の別の態様は、複数の立体構造がドッキングしたときの複合体構造を、前記複数の立体構造の相対的配置に応じた相互作用エネルギーを表すドッキングスコアを算出することにより予測する複合体構造予測方法であって、ドッキングされるべき前記複数の立体構造である第１入力構造および第２入力構造を入力し、立体構造を構成し得る要素を複数のタイプに分類し、前記複数のタイプから作られる任意のペアの相互作用エネルギーを配列したペアポテンシャル行列をペアポテンシャル行列記憶部から読み出し、前記第１入力構造および前記第２入力構造に基づいて前記ペアポテンシャル行列を圧縮し、前記第１入力構造に出現する任意の要素と前記第２入力構造に出現する任意の要素で作られる任意のペアの相互作用エネルギーを配列した圧縮ペアポテンシャル行列を生成し、前記第１入力構造、第２入力構造および前記圧縮ペアポテンシャル行列を用いて、前記第１入力構造および前記第２入力構造を変換して、グリッド毎のデータの集まりからなるスコア算出のための第１構造グリッドデータおよび第２構造グリッドデータを生成し、前記第１構造グリッドデータおよび前記第２構造グリッドデータから、前記第１入力構造および前記第２入力構造のドッキングスコアを算出する。 Another aspect of the present invention is a complex in which a complex structure when a plurality of three-dimensional structures are docked is predicted by calculating a docking score representing an interaction energy according to the relative arrangement of the plurality of three-dimensional structures. A structure prediction method, wherein a first input structure and a second input structure that are the plurality of three-dimensional structures to be docked are input, elements that can form the three-dimensional structure are classified into a plurality of types, and the plurality of types A pair potential matrix in which interaction energy of an arbitrary pair formed from the array is read out from a pair potential matrix storage unit, the pair potential matrix is compressed based on the first input structure and the second input structure, and the first potential matrix is compressed. An interaction energy of an arbitrary pair formed by an arbitrary element appearing in the input structure and an arbitrary element appearing in the second input structure is arranged. The compressed pair potential matrix is generated, and the first input structure and the second input structure are converted using the first input structure, the second input structure, and the compressed pair potential matrix, and data of each grid is converted. First structure grid data and second structure grid data for calculating a score consisting of a group are generated, and the first input structure and the second input structure are generated from the first structure grid data and the second structure grid data. Calculate the docking score.

本発明の別の態様は、複数の立体構造がドッキングしたときの複合体構造を、前記複数の立体構造の相対的配置に応じた相互作用エネルギーを表すドッキングスコアを算出する処理をコンピュータに実行させる複合体構造予測プログラムであって、ドッキングされるべき前記複数の立体構造である第１入力構造および第２入力構造を取得し、立体構造を構成し得る要素を複数のタイプに分類し、前記複数のタイプから作られる任意のペアの相互作用エネルギーを配列したペアポテンシャル行列をペアポテンシャル行列記憶部から読み出し、前記第１入力構造および前記第２入力構造に基づいて前記ペアポテンシャル行列を圧縮し、前記第１入力構造に出現する任意の要素と前記第２入力構造に出現する任意の要素で作られる任意のペアの相互作用エネルギーを配列した圧縮ペアポテンシャル行列を生成し、前記第１入力構造、第２入力構造および前記圧縮ペアポテンシャル行列を用いて、前記第１入力構造および前記第２入力構造を変換して、グリッド毎のデータの集まりからなるスコア算出のための第１構造グリッドデータおよび第２構造グリッドデータを生成し、前記第１構造グリッドデータおよび前記第２構造グリッドデータから、前記第１入力構造および前記第２入力構造のドッキングスコアを算出する処理を前記コンピュータに実行させる。 According to another aspect of the present invention, a computer executes a process of calculating a docking score representing an interaction energy according to a relative arrangement of the plurality of three-dimensional structures with respect to a complex structure when the plurality of three-dimensional structures are docked. A composite structure prediction program that acquires a first input structure and a second input structure that are the plurality of three-dimensional structures to be docked, classifies elements that can form a three-dimensional structure into a plurality of types, and A pair potential matrix in which an interaction energy of an arbitrary pair made from the type is arranged is read from a pair potential matrix storage unit, the pair potential matrix is compressed based on the first input structure and the second input structure, An arbitrary pair of arbitrary elements that appear in the first input structure and arbitrary pairs that appear in any element that appears in the second input structure A compressed pair potential matrix in which energy is arranged, and the first input structure and the second input structure are transformed using the first input structure, the second input structure, and the compressed pair potential matrix, and a grid Generating a first structure grid data and a second structure grid data for calculating a score, each of which is a collection of data, and from the first structure grid data and the second structure grid data, the first input structure and the second structure grid data; The computer is caused to execute a process for calculating a docking score of a two-input structure.

本発明は、上記のように第１入力構造と第２入力構造に基づいてペアポテンシャル行列を圧縮し、圧縮したペアポテンシャル行列を用いてドッキングスコアを計算するので、少ない計算量でドッキングスコアを計算でき、そして、より多くのタイプを持ったペアポテンシャルの処理も可能になる。 In the present invention, the pair potential matrix is compressed based on the first input structure and the second input structure as described above, and the docking score is calculated using the compressed pair potential matrix. Therefore, the docking score is calculated with a small amount of calculation. Yes, and it is possible to handle pair potentials with more types.

以下、本発明の実施の形態に係る複合体構造予測装置について、図面を用いて説明する。以下では、背景技術として説明した事項と共通する事項については、説明を適宜省略する。 Hereinafter, a complex structure prediction apparatus according to an embodiment of the present invention will be described with reference to the drawings. In the following, description of matters common to those described as background art will be omitted as appropriate.

まず、本発明の複合体構造予測装置による構造予測の原理について説明する。本発明では第１入力構造と第２入力構造とがドッキングされる。ここでは、例として、背景技術での説明と同様にタンパク質のドッキングを取り上げ、第１入力構造がレセプタであり、第２入力構造がリガンドであるとする。 First, the principle of structure prediction by the composite structure prediction apparatus of the present invention will be described. In the present invention, the first input structure and the second input structure are docked. Here, as an example, protein docking is taken up as in the background art, and the first input structure is a receptor and the second input structure is a ligand.

従来技術であるＤｉｍａｅｔａｌ．の手法では、ペアポテンシャル行列が固有値分解され、それからグリッドデータが生成され、畳込み積分によりドッキングスコアが計算された。 Prior art Dima et al. In this method, the pair potential matrix was eigenvalue decomposed, grid data was generated from it, and the docking score was calculated by convolution integration.

これに対して、本発明では、まず、ペアポテンシャル行列が下記のようにして圧縮される。圧縮処理は、ドッキングされるべき２つの物体のデータに基づいて、すなわち、入力されたレセプタとリガンドのデータに基づいて行われる。 In contrast, in the present invention, the pair potential matrix is first compressed as follows. The compression process is performed based on data of two objects to be docked, that is, based on input receptor and ligand data.

図７は、簡単な例を用いて圧縮処理を示している。圧縮前のペアポテンシャル行列Ａは、５×５の行列である。レセプタにはタイプ１、４、５の原子しか出現せず、リガンドにはタイプ１、３の原子しか出現しないとする。この場合、ペアポテンシャル行列Ａは、タイプ１、４、５とタイプ１、３の任意の組合せの行列Ａ’へと圧縮される。図では、レセプタのタイプ１、４、５に相当する行に点線が引かれ、また、リガンドのタイプ１、３に対応する列に点線が引かれており、圧縮処理では点線の交点の値が残される。その結果、図示のように、３×２の行列になり、行列の次元が小さくなる。 FIG. 7 illustrates the compression process using a simple example. The pair potential matrix A before compression is a 5 × 5 matrix. It is assumed that only type 1, 4 and 5 atoms appear in the receptor and only type 1 and 3 atoms appear in the ligand. In this case, the pair potential matrix A is compressed into a matrix A 'of any combination of types 1, 4, 5 and types 1, 3. In the figure, dotted lines are drawn in rows corresponding to receptor types 1, 4, and 5 and dotted lines are drawn in columns corresponding to ligand types 1 and 3. In the compression process, the value of the intersection of the dotted lines is drawn. Left behind. As a result, as shown in the figure, the matrix becomes 3 × 2, and the dimension of the matrix becomes small.

上記において、圧縮前のペアポテンシャル行列は、タンパク質を構成する可能性のあるすべての原子タイプにより作られるすべての組合せのポテンシャル値を含んでいる。これに対して、圧縮処理は、レセプタを実際に構成する任意の原子とリガンドを実際に構成する任意の原子で作られる任意のペアのポテンシャルからなる行列へと、ペアポテンシャル行列を変形する。つまり、予測対象のレセプタおよびリガンドに実際に出現する原子により作られ得るペアのポテンシャル値のみを持つ行列になるように、ペアポテンシャル行列が圧縮される。 In the above, the pair potential matrix before compression includes all combinations of potential values created by all atom types that may constitute the protein. On the other hand, in the compression process, the pair potential matrix is transformed into a matrix composed of an arbitrary pair of potentials made of arbitrary atoms that actually constitute the receptor and arbitrary atoms that actually constitute the ligand. That is, the pair potential matrix is compressed so as to have a matrix having only pair potential values that can be created by atoms that actually appear in the receptor and ligand to be predicted.

言い換えれば、圧縮処理は、非出現タイプを削除する処理であり、すなわち、ドッキングすべきレセプタとリガンドに出現し得ない組合せのポテンシャルを削除する。このような圧縮処理により、予測対象のタンパク質には存在しないためにスコア計算で不必要なペアのポテンシャルが削除されて、ペアポテンシャル行列の次元を小さくできる。 In other words, the compression process is a process of deleting the non-appearance type, that is, the potential of the combination that cannot appear in the receptor and ligand to be docked is deleted. By such compression processing, the potential of the pair unnecessary for the score calculation because it does not exist in the protein to be predicted is deleted, and the dimension of the pair potential matrix can be reduced.

上記の圧縮処理により得られる小型化されたペアポテンシャル行列を、圧縮ペアポテンシャル行列Ａ’という。圧縮ペアポテンシャル行列Ａ’は以下のようにして処理される。 The reduced pair potential matrix obtained by the above compression process is referred to as a compressed pair potential matrix A ′. The compressed pair potential matrix A 'is processed as follows.

従来技術ではペアポテンシャル行列が正方、対称行列である特徴を利用して固有値分解された。しかし、圧縮ペアポテンシャル行列Ａ’は、非正方、非対称行列であり、固有値分解できない。そこで、本発明では、圧縮ペアポテンシャル行列Ａ’が、下の式（１２）のように特異値分解（ＳｉｎｇｕｌａｒＶａｌｕｅＤｅｃｏｍｐｏｓｉｔｉｏｎ）される。

In the prior art, eigenvalue decomposition is performed using the feature that the pair potential matrix is a square or symmetric matrix. However, the compressed pair potential matrix A ′ is a non-square, asymmetric matrix and cannot be eigenvalue decomposed. Therefore, in the present invention, the compressed pair potential matrix A ′ is subjected to singular value decomposition as shown in the following equation (12).

Ｕ、Ｖは、直行行列（左、右固有ベクトル）であり、Ｄは対角行列（固有値）である。Ｕ、Ｖは本発明の第１の直行行列および第２の直行行列に相当する。圧縮ペアポテンシャル行列Ａ’は従来技術と同様に近似される。すなわち、対角行列Ｄにおけるｒ＋１番目の大きさ以降の固有値を０にされて、対角行列Ｄの固有値の小さい次元が圧縮される（式（１３））。

U and V are orthogonal matrices (left and right eigenvectors), and D is a diagonal matrix (eigenvalues). U and V correspond to the first orthogonal matrix and the second orthogonal matrix of the present invention. The compressed pair potential matrix A ′ is approximated as in the prior art. That is, the eigenvalues after the (r + 1) th magnitude in the diagonal matrix D are set to 0, and the dimension with the smaller eigenvalues in the diagonal matrix D is compressed (formula (13)).

実際の行列上では、第１行、第１列から、第ｒ行、第ｒ列までの固有値が残り、その他の部分（第ｒ＋１行、第ｒ＋１列以降の固有値を含む）が０になってよい。ｒは、近似処理で残される固有値の個数である。ここでは、ｒに関しては従来技術の説明と同じ用語を用いることとし、すなわち、ｒのことを近似固有値数または近似次元数と呼ぶ（ｒのことは、近似個数、近似固有値残数などと呼んでもよい）。 On the actual matrix, the eigenvalues from the first row and the first column to the rth row and the rth column remain, and the other parts (including eigenvalues after the (r + 1) th row and the (r + 1) th column) become 0. Good. r is the number of eigenvalues left in the approximation process. Here, for r, the same terms as in the description of the prior art are used, that is, r is called an approximate eigenvalue number or an approximate dimension number (r may be called an approximate number, an approximate eigenvalue remaining number, etc.). Good).

近似された圧縮ペアポテンシャル行列Ａ’は、左行列ＵＤｒと右行列Ｖ^Ｔに分けられる。レセプタのグリッド化ベクトルＲに左行列ＵＤｒが掛けられて、第１グリッドデータＲ’が得られる。また、右行列Ｖ^Ｔにリガンドのグリッド化ベクトルＲが掛けられて、第２グリッドデータＬ’が得られる（式（１４））。

Approximated compressed pair potential matrix A 'is divided into left matrix UDr and right matrix V ^T. The receptor gridding vector R is multiplied by the left matrix UDr to obtain first grid data R ′. Also, the right matrix V ^T is multiplied by the ligand grid vector R to obtain second grid data L ′ (formula (14)).

第１グリッドデータＲ’および第２グリッドデータＬ’はそれぞれ本発明の第１構造グリッドデータおよび第２構造グリッドデータに相当する。第１グリッドデータＲ’（レセプタ側）と第２グリッドデータＬ’（リガンド側）を用いると、ドッキングスコアが下の式（１５）のように表される。

The first grid data R ′ and the second grid data L ′ correspond to the first structure grid data and the second structure grid data of the present invention, respectively. When the first grid data R ′ (receptor side) and the second grid data L ′ (ligand side) are used, the docking score is expressed as the following equation (15).

ここで、ドッキングスコアは、元々、レセプタのグリッド化ベクトルデータと、ペアポテンシャル行列と、リガンドのグリッド化ベクトルデータとの積で表される（式（７）参照）。ペアポテンシャル行列が、圧縮、近似され、左行列、右行列に分けられる。さらにレセプタと左行列の積が第１グリッドデータＲ’に置き換えられ、右行列とリガンドの積が第２グリッドデータＬ’へ置き換えられると、上記の式が得られる。このようにして、ドッキングスコアが第１グリッドデータと第２グリッドデータの積により表されている。そして、この変形された式において、第１グリッドデータＲ’のｒ＋１次元目以降の要素は、近似処理によって０になっており、したがって、ｒ回の実数畳込みの和によってドッキングスコアが求められる。ｒは、式（１３）に関連して説明したように、近似固有値数または近似次元数であり、ペアポテンシャル行列を“特異値分解”して得られる対角行列を近似するときに残される固有値の個数に相当する。 Here, the docking score is originally represented by the product of the receptor gridded vector data, the pair potential matrix, and the ligand gridded vector data (see Expression (7)). The pair potential matrix is compressed and approximated and divided into a left matrix and a right matrix. Further, when the product of the receptor and the left matrix is replaced with the first grid data R ′, and the product of the right matrix and the ligand is replaced with the second grid data L ′, the above equation is obtained. In this way, the docking score is represented by the product of the first grid data and the second grid data. In this modified expression, the elements after the (r + 1) -th dimension of the first grid data R ′ are 0 by the approximation process, and therefore the docking score is obtained by the sum of r number of real convolutions. As described in relation to the equation (13), r is the number of approximate eigenvalues or approximate dimensions, and the eigenvalue remaining when approximating the diagonal matrix obtained by “singular value decomposition” of the pair potential matrix. It corresponds to the number of.

以上に本発明の複合体構造予測の原理を説明した。本発明ではペアポテンシャル行列を圧縮しているので、計算量を低減できる。計算量が少ないので、より多くのタイプを持ったより大きなペアポテンシャル行列の処理も可能になる。このことは、スコア精度を高くして予測精度を向上するのに有利である。 The principle of composite structure prediction according to the present invention has been described above. In the present invention, since the pair potential matrix is compressed, the amount of calculation can be reduced. Since the amount of calculation is small, it is possible to process a larger pair potential matrix with more types. This is advantageous in improving the prediction accuracy by increasing the score accuracy.

また、上記のようにより多くのタイプを持ったより大きなペアポテンシャル行列を使えるということは、サイズの大きな多様なペアポテンシャル行列を使えることを意味する。このことは、ペアポテンシャル行列を用いる予測技術の適用範囲を広げるのにも有利であり、タンパク質以外の物質の複合体構造も予測可能になると考えられる。タンパク質以外の物質は、例えば、糖およびＤＮＡである。また、カーボンナノチューブなどのバイオテクノロジーと異なる分野の物質でもよい。タンパク質以外の物質の複合体構造を予測する場合に、物質を構成する原子等の要素のタイプが多いと、ペアポテンシャル行列が大きくなり、計算量を増加させる。しかし、本発明を適用すると、ペアポテンシャル行列を圧縮するので、計算量の増大が抑えられる。 In addition, the fact that a larger pair potential matrix having more types as described above can be used means that a variety of large pair potential matrices can be used. This is also advantageous for expanding the application range of the prediction technique using the pair potential matrix, and it is considered that the complex structure of substances other than proteins can also be predicted. Substances other than proteins are, for example, sugar and DNA. Moreover, the substance of the field | area different from biotechnology, such as a carbon nanotube, may be sufficient. When predicting a complex structure of a substance other than a protein, if there are many types of elements such as atoms constituting the substance, the pair potential matrix becomes large, which increases the amount of calculation. However, when the present invention is applied, since the pair potential matrix is compressed, an increase in the amount of calculation can be suppressed.

実際、本実施の形態で説明するような高速フーリエ変換を用いた全解探索に関しては、タンパク質以外の物質へのペアポテンシャル行列の適用例はない。この理由は、ペアポテンシャル行列が上記のように大きくなってしまうことにあると考えられる。これに対して、本発明によれば、複合体構造予測の汎用性を増大し、適用範囲を拡大し、タンパク質以外の物質にも適用可能になることを期待できる。 Actually, there is no application example of the pair potential matrix to a substance other than protein for the full solution search using the fast Fourier transform as described in the present embodiment. The reason for this is considered that the pair potential matrix becomes large as described above. On the other hand, according to the present invention, it can be expected that the versatility of complex structure prediction is increased, the scope of application is expanded, and it can be applied to substances other than proteins.

次に、本実施の形態に係る複合体構造予測装置の具体的構成について説明する。図８は、複合体構造予測装置を実現するコンピュータの構成をハードウエア面から示している。図８において、コンピュータ１は、例えばパーソナルコンピュータであるが、より大型のコンピュータでもよい。プログラム実行部３は、ＣＰＵ等のプロセッサで構成され、プログラム記憶部５および処理データ記憶部７はメモリで構成される。また、コンピュータは、ハードディスク等の外部記憶装置１１を備え、さらに、入力装置１３、出力装置１５、記録媒体装着部１７および通信部１９などを備えている。 Next, a specific configuration of the complex structure prediction apparatus according to the present embodiment will be described. FIG. 8 shows the configuration of a computer that realizes the composite structure prediction apparatus from the hardware aspect. In FIG. 8, the computer 1 is a personal computer, for example, but may be a larger computer. The program execution unit 3 includes a processor such as a CPU, and the program storage unit 5 and the processing data storage unit 7 include a memory. The computer also includes an external storage device 11 such as a hard disk, and further includes an input device 13, an output device 15, a recording medium mounting unit 17, a communication unit 19, and the like.

プログラム記憶部５は、本実施の形態の複合体構造予測装置および方法を実現するためのプログラムを記憶し、特に、ドッキングされるべき立体構造データを入力し、ペアポテンシャル行列に対して圧縮等の変形を行い、立体構造データを回転し、グリッドデータを生成し、ＦＦＴによるドッキングスコア計算を行い、複合体構造の候補を選定し、予測結果を出力するためのプログラムを記憶する。こうした複合体構造予測のためのプログラムをコンピュータ１の主としてプログラム実行部３にて実行することにより、本実施の形態に係る複合体構造予測装置および方法が実現される。これらプログラムの内容は、以下の複合体構造予測装置の説明において述べられる。 The program storage unit 5 stores a program for realizing the complex structure prediction apparatus and method according to the present embodiment. In particular, the program storage unit 5 inputs three-dimensional structure data to be docked, and compresses the pair potential matrix. Deformation is performed, three-dimensional structure data is rotated, grid data is generated, docking score calculation is performed by FFT, complex structure candidates are selected, and a program for outputting a prediction result is stored. The complex structure prediction apparatus and method according to the present embodiment are realized by executing such a complex structure prediction program mainly in the program execution unit 3 of the computer 1. The contents of these programs are described in the description of the complex structure prediction apparatus below.

処理データ記憶部７は、処理されるべきデータや、処理後のデータを記憶する。処理データ記憶部７は、例えば、ドッキングされるべき入力構造のデータ、ペアポテンシャル行列、および、処理結果のドッキングスコアを記憶する。その他にも、メモリは、プログラム実行部３による処理の作業エリアとして機能し、ＦＦＴの作業エリアやデータ保存エリアなどとして使われる。 The processing data storage unit 7 stores data to be processed and data after processing. The processing data storage unit 7 stores, for example, input structure data to be docked, a pair potential matrix, and a processing result docking score. In addition, the memory functions as a work area for processing by the program execution unit 3, and is used as an FFT work area, a data storage area, and the like.

コンピュータ１へのデータの入出力は、典型的には、入力装置１３および出力装置１５を介して行われる。その他、データの入出力は、記録媒体装着部１７を介して、記録媒体との間で行われてもよい。また、データの入出力は、通信部１９を介して行われてよい。つまり、これら構成が入力部および出力部として機能してよい。 Data input / output to / from the computer 1 is typically performed via the input device 13 and the output device 15. In addition, data input / output may be performed between the recording medium and the recording medium via the recording medium mounting unit 17. Data input / output may be performed via the communication unit 19. That is, these configurations may function as an input unit and an output unit.

通信部１９を使う場合、コンピュータ１がＷＥＢサーバに接続され、ネットワークを介してデータが入出力されてよい。また、コンピュータ１がＷＥＢサーバの機能を有していてもよい。そして、ＷＥＢ経由でドッキングすべき構造データが入力され、予測結果がＷＥＢへの出力によって返送されてよい。 When the communication unit 19 is used, the computer 1 may be connected to a WEB server, and data may be input / output via a network. Further, the computer 1 may have the function of a WEB server. Then, structure data to be docked via WEB may be input, and a prediction result may be returned by output to WEB.

図９は、本実施の形態の複合体構造予測装置を機能ブロック図のかたちで示している。図示のように、複合体構造予測装置２１は、構造入力部２３、ペアポテンシャル行列記憶部２５、ペアポテンシャル行列変形部２７、構造回転部２９、グリッド処理部３１、スコア算出部３３、候補選定部３５および予測結果出力部３７を備えている。 FIG. 9 shows the complex structure prediction apparatus of the present embodiment in the form of a functional block diagram. As illustrated, the complex structure prediction apparatus 21 includes a structure input unit 23, a pair potential matrix storage unit 25, a pair potential matrix transformation unit 27, a structure rotation unit 29, a grid processing unit 31, a score calculation unit 33, and a candidate selection unit. 35 and a prediction result output unit 37.

構造入力部２３は、ドッキングされるべき第１入力構造および第２入力構造のデータを入力する。本実施の形態の例では第１入力構造がレセプタであり、第２入力構造がリガンドである。そして、入力構造データは座標データであり、タンパク質を構成する各原子の位置座標からなる。前述したように、図１の入力装置１３、記録媒体装着部１７および通信部１９などが構造入力部２３として機能し得る。 The structure input unit 23 inputs data of the first input structure and the second input structure to be docked. In the example of the present embodiment, the first input structure is a receptor and the second input structure is a ligand. The input structure data is coordinate data and consists of position coordinates of each atom constituting the protein. As described above, the input device 13, the recording medium mounting unit 17, the communication unit 19, and the like of FIG. 1 can function as the structure input unit 23.

ペアポテンシャル行列記憶部２５は、図５に例示されるようなペアポテンシャル行列を記憶する。ペアポテンシャル行列記憶部２５としては、図１のメモリおよび外部記憶装置などが好適に機能し得る。 The pair potential matrix storage unit 25 stores a pair potential matrix as illustrated in FIG. As the pair potential matrix storage unit 25, the memory and the external storage device of FIG.

ペアポテンシャル行列変形部２７は、図示のように、圧縮部４１、特異値分解部４３、近似部４５、左行列生成部４７および右行列生成部４９を有する。圧縮部４１は、図７を用いて説明したように、レセプタに出現する任意の原子とリガンドに出現する任意の原子で作られる任意のペアのポテンシャルからなる行列へと、ペアポテンシャル行列Ａを変形し、これにより圧縮ペアポテンシャル行列Ａ’を生成する。より詳細には、圧縮部４１は、レセプタに出現する原子の全タイプと、リガンドに出現する原子の全タイプとをそれぞれ検出する。そして、両構造から検出されたタイプの任意の組合せに対応するポテンシャル値をペアポテンシャル行列から抽出する。抽出されたポテンシャル値を配列して、図７に示すような圧縮ペアポテンシャル行列を生成する。これにより、ペアポテンシャル行列は、予測対象のレセプタおよびリガンドに実際に出現する原子により作られ得るペアのポテンシャル値のみを持つ低次元の行列へと圧縮される。 The pair potential matrix transformation unit 27 includes a compression unit 41, a singular value decomposition unit 43, an approximation unit 45, a left matrix generation unit 47, and a right matrix generation unit 49 as illustrated. As described with reference to FIG. 7, the compression unit 41 transforms the pair potential matrix A into a matrix composed of an arbitrary pair of potentials formed by an arbitrary atom appearing in the receptor and an arbitrary atom appearing in the ligand. Thus, a compressed pair potential matrix A ′ is generated. More specifically, the compression unit 41 detects all types of atoms that appear in the receptor and all types of atoms that appear in the ligand. Then, a potential value corresponding to an arbitrary combination of types detected from both structures is extracted from the pair potential matrix. The extracted potential values are arranged to generate a compressed pair potential matrix as shown in FIG. As a result, the pair potential matrix is compressed into a low-dimensional matrix having only pair potential values that can be created by atoms actually appearing in the receptor and ligand to be predicted.

特異値分解部４３は、圧縮ペアポテンシャル行列Ａ’に対して特異値分解処理を行い、直行行列Ｕ、対角行列Ｄおよび直行行列Ｖへと分解する。近似部４５は、特異値分解で得られた対角行列Ｄを圧縮することで圧縮ペアポテンシャル行列Ａ’を近似する。この際、近似処理では既に述べたようにｒ個（近似固有値数または近似次元数）の固有値が残されるが、ｒの値は、入力装置１１などから入力されて指定されてよい。さらに、左行列生成部４７および右行列生成部４９は、分解、近似後の行列を分けて、前述した左行列ＵＤｒおよび右行列Ｖ^Ｔを生成する。 The singular value decomposition unit 43 performs singular value decomposition processing on the compressed pair potential matrix A ′, and decomposes it into an orthogonal matrix U, a diagonal matrix D, and an orthogonal matrix V. The approximating unit 45 approximates the compressed pair potential matrix A ′ by compressing the diagonal matrix D obtained by singular value decomposition. At this time, in the approximation process, as described above, r (number of approximate eigenvalues or approximate dimension) eigenvalues are left, but the value of r may be input and specified from the input device 11 or the like. Further, the left matrix generation unit 47 and the right matrix generation unit 49 divide the matrix after decomposition and approximation, and generate the left matrix UDr and the right matrix V ^T described above.

構造回転部２９は、第２入力構造であるリガンドを回転する。予め設定された角度データに従い、リガンドは少しずつ回転される。リガンドが回転されるたびに、以下に説明するリガンドのグリッド化とスコア計算が行われ、これにより回転移動の探索が実現される。 The structure rotating unit 29 rotates the ligand that is the second input structure. The ligand is rotated little by little in accordance with preset angle data. Each time the ligand is rotated, the grid of the ligand and score calculation described below are performed, thereby realizing a search for rotational movement.

グリッド処理部３１は、レセプタ（第１入力構造）、リガンド（第２入力構造）および圧縮ペアポテンシャル行列Ａ’を用いて、レセプタおよびリガンドを変換して、第１グリッドデータＲ’および第２グリッドデータＬ’を生成する。圧縮ペアポテンシャル行列Ａ’としては、ペアポテンシャル行列変形部２７から得られる左行列ＵＤｒおよび右行列Ｖ^Ｔが用いられる（左行列と右行列の積が、近似された圧縮ペアポテンシャル行列Ａ’である）。 The grid processing unit 31 converts the receptor and the ligand using the receptor (first input structure), the ligand (second input structure), and the compressed pair potential matrix A ′, so that the first grid data R ′ and the second grid are converted. Data L ′ is generated. Compressing the pair potential matrix A 'as the product of the left matrix UDr and right matrix V ^T is obtained from the pair potential matrix deformation portion 27 is used (the left matrix and the right matrix, compression pair potential matrix A is approximated' is ).

グリッド処理部３１は、第１グリッド生成部５１および第２グリッド生成部５３を有している。第１グリッド生成部５１は、第１入力構造であるレセプタに関するグリッド化を行う。レセプタは座標データのかたちで入力されている。第１グリッド生成部５１は、レセプタをグリッド化してベクトルデータＲのかたちに変換し、さらに、レセプタのベクトルデータＲと左行列ＵＤｒを掛けて、第１グリッドデータＲ’とする。 The grid processing unit 31 includes a first grid generation unit 51 and a second grid generation unit 53. The 1st grid production | generation part 51 performs the grid formation regarding the receptor which is a 1st input structure. The receptor is entered in the form of coordinate data. The first grid generator 51 grids the receptor and converts it into the form of vector data R, and further multiplies the receptor vector data R by the left matrix UDr to obtain first grid data R ′.

第２グリッド生成部５３は、第２入力構造であるリガンドに関するグリッド化を行う。このグリッド化は、構造回転部２９にて回転処理を経たリガンドに対して行われる。第２グリッド生成部５３は、リガンドをグリッド化してベクトルデータＬのかたちに変換し、さらに、右行列Ｖ^ＴにリガンドのベクトルデータＬを掛けて、第２グリッドデータＬ’とする。 The 2nd grid production | generation part 53 performs the grid formation regarding the ligand which is a 2nd input structure. This grid formation is performed on the ligand that has undergone the rotation process in the structure rotation unit 29. The second grid generation unit 53 converts the ligand into a grid and converts it into the form of vector data L, and further multiplies the right matrix V ^T by the ligand vector data L to obtain second grid data L ′.

スコア算出部３３は、第１グリッドデータＲ’および第２グリッドデータＬ’からドッキングスコアを算出する。 The score calculation unit 33 calculates a docking score from the first grid data R ′ and the second grid data L ′.

図１０は、スコア算出部３３の構成を示している。スコア計算では、従来技術の例と同様に、レセプタ側の第１グリッドデータＲ’が固定され、リガンド側の第２グリッドデータＬ’が移動される。そこで、図中では前者をＳｔａｔｉｃ、後者をＭｏｂｉｌｅと表している。 FIG. 10 shows the configuration of the score calculation unit 33. In the score calculation, the first grid data R ′ on the receptor side is fixed and the second grid data L ′ on the ligand side is moved, as in the example of the prior art. Therefore, in the figure, the former is represented as Static and the latter is represented as Mobile.

スコア算出部３３は、ＦＦＴ部６１、掛け算処理部６３、加算処理部６５および逆ＦＦＴ部６７を有している。従来はＳｔａｔｉｃとＭｏｂｉｌｅに対して個別にＦＦＴ処理が施されるが、本実施の形態では、ＳｔａｔｉｃとＭｏｂｉｌｅの組合せに対して、ＦＦＴ部６１でＦＦＴ処理が行われる。そして、ＦＦＴ後のＳｔａｔｉｃとＭｏｂｉｌｅが掛け算処理部６３で掛け算される。これらの処理は、ｒ回行われる。ｒは、近似固有値数または近似次元数（圧縮ペアポテンシャル行列Ａ’を特異値分解して得られる対角行列の近似処理で残された固有値の個数）である。そして、ｒ個の演算結果の和が加算処理部６５により求められ、さらに、逆ＦＦＴ処理が逆ＦＦＴ部６７により施されて、ドッキングスコアが求められる。 The score calculation unit 33 includes an FFT unit 61, a multiplication processing unit 63, an addition processing unit 65, and an inverse FFT unit 67. Conventionally, FFT processing is individually performed on Static and Mobile, but in this embodiment, FFT processing is performed by the FFT unit 61 for the combination of Static and Mobile. Then, Static and Mobile after FFT are multiplied by the multiplication processing unit 63. These processes are performed r times. r is the number of approximate eigenvalues or the number of approximate dimensions (the number of eigenvalues left in the approximation process of the diagonal matrix obtained by singular value decomposition of the compressed pair potential matrix A ′). Then, the sum of the r calculation results is obtained by the addition processing unit 65, and further, an inverse FFT process is performed by the inverse FFT unit 67 to obtain a docking score.

図１１は、図１０で説明したスコア算出部３３の処理に必要なメモリサイズを示している。スコア算出部３３は、概略的には、２つのメモリ領域を使用する。図１１の左側の３つの領域は一つの同じメモリ領域であり、これが第１メモリ領域７１である。そして、右端の１つが第２メモリ領域７３である。第１メモリ領域７１は、Ｓｔａｔｉｃ（第１グリッドデータＲ’）およびＭｏｂｉｌｅ（第２グリッドデータＬ’）を共に入力する領域であり、また、ＦＦＴの作業領域である。第２メモリ領域７３は、和の保存領域である。 FIG. 11 shows the memory size required for the processing of the score calculation unit 33 described in FIG. The score calculation unit 33 generally uses two memory areas. The three areas on the left side of FIG. 11 are the same memory area, which is the first memory area 71. One of the right ends is the second memory area 73. The first memory area 71 is an area for inputting both Static (first grid data R ′) and Mobile (second grid data L ′), and is an FFT work area. The second memory area 73 is a sum storage area.

各々のメモリ領域は、Ｎ×Ｎ×Ｎの大きさを有する。Ｎは、３次元ＦＦＴにおける一次元方向のサイズであり、グリッド数で表される。つまり、スコア算出部３３では、Ｎ×Ｎ×Ｎのグリッドの空間でドッキング構造を探索する。 Each memory area has a size of N × N × N. N is the size in the one-dimensional direction in the three-dimensional FFT, and is represented by the number of grids. That is, the score calculation unit 33 searches for a docking structure in an N × N × N grid space.

さて、従来技術ではＳｔａｔｉｃとＭｏｂｉｌｅに対して別々にＦＦＴ処理が行われていたのに対して、本実施の形態では、図１０に示したように、ＳｔａｔｉｃとＭｏｂｉｌｅの組合せに対してＦＦＴ処理が行われる。そこで、図１１に示すように、第１メモリ領域７１にＳｔａｔｉｃとＭｏｂｉｌｅの組合せが入力される。 In the prior art, FFT processing is separately performed for Static and Mobile. In the present embodiment, as shown in FIG. 10, FFT processing is performed for the combination of Static and Mobile. Done. Therefore, as shown in FIG. 11, a combination of Static and Mobile is input to the first memory area 71.

入力されたＳｔａｔｉｃとＭｏｂｉｌｅに対しては、図示のように、第１の方向（ｘ方向）のＦＦＴ処理が施され、さらに第２の方向（ｙ方向）のＦＦＴ処理が施される。そして、さらに、第３の方向（ｚ方向）のＦＦＴ処理と掛け算処理が施され、処理結果が第２メモリ領域７３に保存される。 As shown in the figure, the input Static and Mobile are subjected to FFT processing in the first direction (x direction) and further subjected to FFT processing in the second direction (y direction). Further, FFT processing and multiplication processing in the third direction (z direction) are performed, and the processing result is stored in the second memory area 73.

このようにして、本実施の形態では、２つのメモリ領域７１、７３でスコア計算が実現される。入力サイズとしては、図１１の左端に示すように、２×（Ｎ／２）×（Ｎ／２）×（Ｎ／２）が必要である。ＦＦＴ作業領域としては、Ｎ×Ｎ×Ｎの領域が必要である。また、和の保存領域として、Ｎ×Ｎ×Ｎの領域が必要である。図示のように、入力領域（２×（Ｎ／２）×（Ｎ／２）×（Ｎ／２））はＦＦＴ作業領域（Ｎ×Ｎ×Ｎ）に含まれるので、結局、全体の必要メモリサイズは、２×Ｎ×Ｎ×Ｎである。 In this way, in the present embodiment, score calculation is realized by the two memory areas 71 and 73. As the input size, 2 × (N / 2) × (N / 2) × (N / 2) is required as shown at the left end of FIG. As the FFT work area, an N × N × N area is required. Further, an area of N × N × N is required as the sum storage area. As shown in the figure, the input area (2 × (N / 2) × (N / 2) × (N / 2)) is included in the FFT work area (N × N × N). The size is 2 × N × N × N.

Ｎは実用上は５０〜３５０であるとする。Ｎ＝３５０とすると、Ｎ×Ｎ×Ｎに相当するメモリ領域はおよそ２００ＭＢである。したがって、スコア計算部３１で必要なメモリサイズは、およそ４００ＭＢ以下ということができる。 N is practically 50 to 350. If N = 350, the memory area corresponding to N × N × N is approximately 200 MB. Therefore, it can be said that the memory size required by the score calculation unit 31 is approximately 400 MB or less.

上記の構成の特徴として、スコア計算に必要なメモリサイズが、ｒ（近似固有値数または近似次元数：圧縮ペアポテンシャル行列を特異値分解して得られる対角行列の近似処理で残される固有値の個数）に比例しない（従来技術では、必要メモリサイズがｒに比例して増大した）。したがって、ｒを大きくしても、メモリサイズが膨大にならない。このことから、ｒを大きくして、スコア精度を向上し、予測精度を向上することができる。省メモリ環境下でも、高い精度が得られる。 As a feature of the above configuration, the memory size necessary for the score calculation is r (number of approximate eigenvalues or approximate dimensions: the number of eigenvalues remaining in the approximation process of the diagonal matrix obtained by singular value decomposition of the compressed pair potential matrix. (In the prior art, the required memory size has increased in proportion to r). Therefore, even if r is increased, the memory size does not become enormous. From this, r can be enlarged, the score accuracy can be improved, and the prediction accuracy can be improved. High accuracy can be obtained even in a memory-saving environment.

図９に戻ると、候補選定部３５は、第１入力構造であるレセプタと第２入力構造であるリガンドのドッキングスコアを用いて、複合体構造の候補を選定する。本実施の形態の複合体構造予測装置２１では、リガンドが回転される。そして、リガンドが各々の角度にあるときに、スコア算出部３３が、レセプタとリガンドを平行移動したときの各配置でのドッキングスコアを算出する。したがって、スコア算出部３３は、レセプタとリガンドを少しずつ平行移動および回転移動したときの各々の相対配置でのドッキングスコアＳを算出する。こうして得られる多数のドッキングスコアＳが比較され、ドッキングスコアＳが大きい方から所定数の相対配置が、複合体構造の候補として求められる。 Returning to FIG. 9, the candidate selection unit 35 selects a complex structure candidate using the receptor as the first input structure and the docking score of the ligand as the second input structure. In the complex structure prediction apparatus 21 of the present embodiment, the ligand is rotated. Then, when the ligand is at each angle, the score calculation unit 33 calculates a docking score in each arrangement when the receptor and the ligand are translated. Therefore, the score calculation unit 33 calculates the docking score S in each relative arrangement when the receptor and the ligand are translated and rotated little by little. A large number of docking scores S obtained in this manner are compared, and a predetermined number of relative arrangements are determined as candidates for the complex structure from the one with the higher docking score S.

例えば、１方向の平行移動量を８０グリッドとし、２方向に６度ずつ回転したとする。この場合、８０×８０×８０×６０×６０のドッキングスコアＳが得られる。そして、ドッキングスコアＳが大きい方から１０００個程度の相対配置が、複合体構造の候補を表すデータとして求められる。 For example, assume that the amount of parallel movement in one direction is 80 grids, and the direction is rotated by 6 degrees in two directions. In this case, a docking score S of 80 × 80 × 80 × 60 × 60 is obtained. Then, about 1000 relative arrangements from the larger docking score S are obtained as data representing the candidate complex structure.

予測結果出力部３７は、複合体構造の予測結果を出力する構成である。予測結果としては、上述の複合体構造の候補のデータが、各候補のドッキングスコアＳと共に出力される。前述したように、図１の出力装置１５、記録媒体装着部１７および通信部１９などが予測結果出力部３７として機能し得る。 The prediction result output unit 37 is configured to output a prediction result of the complex structure. As the prediction result, candidate data of the above complex structure is output together with the docking score S of each candidate. As described above, the output device 15, the recording medium mounting unit 17, the communication unit 19, and the like of FIG. 1 can function as the prediction result output unit 37.

以上に、本実施の形態に係る複合体構造予測装置２１の各部の構成について説明した。次に、図１２を参照し、複合体構造予測装置２１の動作を説明する。図１２では、データ名が四角形の枠と共に示され、処理名が楕円形の枠と共に示されている。 The configuration of each part of the complex structure prediction apparatus 21 according to the present embodiment has been described above. Next, the operation of the complex structure prediction apparatus 21 will be described with reference to FIG. In FIG. 12, the data name is shown with a rectangular frame, and the process name is shown with an oval frame.

図１２に示すように、構造入力部２３から第１入力構造（レセプタ）および第２入力構造（リガンド）が入力されると、ペアポテンシャル行列記憶部２５からペアポテンシャル行列が読み出される。 As shown in FIG. 12, when the first input structure (receptor) and the second input structure (ligand) are input from the structure input unit 23, the pair potential matrix is read from the pair potential matrix storage unit 25.

そして、第１入力構造、第２入力構造を用いて、ペアポテンシャル行列変形部２７によりペアポテンシャル行列が圧縮される（Ｓ１）。ここでは、上述したように、第１入力構造と第２入力構造に出現する原子のペアのみのポテンシャルを配置した行列になるように、ペアポテンシャル行列が圧縮される（すなわち、非出現タイプが削除される）。さらに、圧縮ペアポテンシャル行列Ａ’の特異値分解、対角行列Ｄの近似が行われる。そして、圧縮ペアポテンシャル行列Ａ’が分けられて、左行列ＵＤｒおよび右行列Ｖ^Ｔが得られる。 Then, the pair potential matrix is compressed by the pair potential matrix transformation unit 27 using the first input structure and the second input structure (S1). Here, as described above, the pair potential matrix is compressed so as to be a matrix in which potentials of only pairs of atoms appearing in the first input structure and the second input structure are arranged (that is, the non-appearance type is deleted). ) Further, the singular value decomposition of the compressed pair potential matrix A ′ and the approximation of the diagonal matrix D are performed. The compressed pair potential matrix A 'is divided, the left matrix UDr and right matrix V ^T is obtained.

また、第２入力構造であるリガンドは、予め設定された角度データに従って、構造回転部２９により回転され、回転構造が得られる（Ｓ３）。 Further, the ligand that is the second input structure is rotated by the structure rotating unit 29 according to preset angle data, and a rotating structure is obtained (S3).

第１入力構造であるレセプタは、第１グリッド生成部５１によりグリッド化されて、左行列ＵＤｒを用いて第１グリッドデータＲ’が得られる（Ｓ５）。また、回転を経た第２入力構造であるリガンドが、第２グリッド生成部５３によりグリッド化されて、右行列Ｖ^Ｔを用いて第２グリッドデータＬ’が得られる（Ｓ７）。 The receptor having the first input structure is gridded by the first grid generation unit 51, and the first grid data R ′ is obtained using the left matrix UDr (S5). Further, the ligand that is the second input structure that has been rotated is gridded by the second grid generation unit 53, and the second grid data L ′ is obtained using the right matrix V ^T (S7).

そして、第１グリッドデータＲ’と第２グリッドデータＬ’を対象として、スコア算出部３３で畳込み計算が行われて、ドッキングスコアＳが得られる（Ｓ９）。さらに、フィルタリング処理として、複合体構造の候補が候補選定部３５により選定され（Ｓ１１）、選定された候補が、該当スコアと共に出力データとして予測結果出力部３７から出力される。 Then, the score calculation unit 33 performs a convolution calculation on the first grid data R ′ and the second grid data L ′, and a docking score S is obtained (S9). Further, as a filtering process, a candidate for the complex structure is selected by the candidate selection unit 35 (S11), and the selected candidate is output from the prediction result output unit 37 as output data together with the corresponding score.

図１３は、出力データの例であって、本実施の形態の複合体構造予測装置２１による計算例を示している。この例では、３Ｄ−ＦＦＴを行ったグリッドの大きさ（一方向のグリッド数）が８０である。また、グリッド間隔（格子のサイズ）が１．２オングストロームである。リガンドおよびレセプタの中心位置は、各物体を包含する球の中心座標である。出力データとしては、ドッキングスコアＳが大きい順に、リガンドとレセプタの相対配置が複合体構造の候補として配列されている。相対配置は、リガンドの回転角度と、上記の中心位置を原点としたリガンドの移動量とで表される。各々の相対配置におけるドッキングスコアＳも図示のように出力される。 FIG. 13 is an example of output data, and shows a calculation example by the complex structure prediction apparatus 21 of the present embodiment. In this example, the size of the grid subjected to 3D-FFT (the number of grids in one direction) is 80. The grid interval (grid size) is 1.2 angstroms. The center positions of the ligand and the receptor are the center coordinates of a sphere that includes each object. As output data, the relative arrangement of the ligand and the receptor is arranged as a complex structure candidate in descending order of the docking score S. The relative arrangement is represented by the rotation angle of the ligand and the amount of movement of the ligand with the center position as the origin. The docking score S in each relative arrangement is also output as shown.

以上に本発明の好適な実施の形態について説明した。本実施の形態によれば、上記のように、ペアポテンシャル行列が圧縮されて、圧縮ペアポテンシャル行列を用いてドッキングスコアが計算される。したがって、ドッキングスコア計算処理における計算量を少なくできる。計算量を少なくできるので、より多くのタイプを持った大きなペアポテンシャルの処理も可能になる。り、これによりスコア精度を向上でき、また、予測対象の範囲を拡大して、タンパク質以外の物質の複合体構造も予測可能になる。 The preferred embodiments of the present invention have been described above. According to the present embodiment, as described above, the pair potential matrix is compressed, and the docking score is calculated using the compressed pair potential matrix. Therefore, the calculation amount in the docking score calculation process can be reduced. Since the amount of calculation can be reduced, it is possible to process a large pair potential with more types. As a result, the score accuracy can be improved, and the range of the prediction target can be expanded to predict the complex structure of substances other than proteins.

上記のように大きなペアポテンシャル行列の処理が可能になることは、スコア精度を高くして予測精度を向上するのに有利であり、さらには、既に説明した通り、ペアポテンシャルを用いる予測技術の適用範囲を広げるのにも有利である。そして、タンパク質以外の物質への適用が期待される。タンパク質以外の物質は、既に述べたように、例えば、糖およびＤＮＡである。また、バイオテクノロジー以外の分野の物質でもよい。タンパク質以外の物質の複合体構造を予測する場合に、物質を構成する原子等の要素のタイプが多いと、ペアポテンシャル行列が大きくなり、従来技術では計算量が膨大になる。しかし、本発明を適用すると、ペアポテンシャル行列を圧縮するので、計算量の増大が抑えられ、大きなペアポテンシャル行列も処理可能になる。 Being able to process a large pair potential matrix as described above is advantageous for improving the accuracy of the prediction by increasing the score accuracy, and furthermore, as already described, the application of the prediction technology using the pair potential is applied. It is also advantageous to expand the range. And application to substances other than protein is expected. Substances other than proteins are sugars and DNA, for example, as already mentioned. Moreover, the substance of field | areas other than biotechnology may be sufficient. When a complex structure of a substance other than protein is predicted, if there are many types of elements such as atoms constituting the substance, the pair potential matrix becomes large, and the amount of calculation becomes large in the conventional technique. However, when the present invention is applied, since the pair potential matrix is compressed, an increase in the amount of calculation can be suppressed and a large pair potential matrix can be processed.

また、本発明によれば、圧縮ペアポテンシャル行列は特異値分解処理によって第１直行行列、対角行列および第２直行行列に分解され、対角行列の複数の固有値の一部を０にすることで近似固有値数の固有値を残す近似処理が行われる。このように、特異値分解を行うことにより、圧縮ペアポテンシャル行列をドッキングスコア計算に好適に利用でき、また、近似処理により計算量を削減できる。 According to the present invention, the compressed pair potential matrix is decomposed into a first orthogonal matrix, a diagonal matrix, and a second orthogonal matrix by a singular value decomposition process, and a part of a plurality of eigenvalues of the diagonal matrix is set to zero. An approximation process is performed in which eigenvalues of the number of approximate eigenvalues are left. Thus, by performing singular value decomposition, the compressed pair potential matrix can be suitably used for docking score calculation, and the calculation amount can be reduced by approximation processing.

また、本発明によれば、第１直行行列と近似された対角行列と第２直行行列とが、圧縮ペアポテンシャル行列として用いられる。第１入力構造、第１の直行行列および近似された対角行列から第１構造グリッドデータが生成され、第２入力構造および第２直行行列から第２構造グリッドデータが生成される。このようなグリッドデータを用いることでスコア算出を好適に行える。 Further, according to the present invention, the diagonal matrix approximated to the first orthogonal matrix and the second orthogonal matrix are used as the compressed pair potential matrix. First structural grid data is generated from the first input structure, the first orthogonal matrix and the approximated diagonal matrix, and second structural grid data is generated from the second input structure and the second orthogonal matrix. By using such grid data, score calculation can be suitably performed.

また、本発明によれば、スコア算出部は、第１構造グリッドデータと第２構造グリッドデータの組合せに対し、近似処理で残された固有値の個数である近似固有値数に対応する回数、ＦＦＴ処理および掛け算処理を施し、それらの処理結果の和を求める。このようにスコア算出部を構成すると、図１３を用いて説明したように、スコア計算に必要なメモリサイズが近似固有値数（対角行列の近似の際に残される固有値）に比例しなくなる。したがって、メモリサイズを小さくできる。また、近似の際により多くの固有値を残すことができ、スコア精度を向上できる。こうして、より少ないメモリサイズでもってより高い精度でドッキングスコアを計算可能にできる。 Further, according to the present invention, the score calculation unit performs FFT processing for the number of approximate eigenvalues, which is the number of eigenvalues left in the approximation process, for the combination of the first structural grid data and the second structural grid data. And the multiplication process is performed, and the sum of the processing results is obtained. If the score calculation unit is configured in this way, as described with reference to FIG. 13, the memory size necessary for the score calculation is not proportional to the number of approximate eigenvalues (the eigenvalues remaining when the diagonal matrix is approximated). Therefore, the memory size can be reduced. In addition, more eigenvalues can be left in the approximation, and score accuracy can be improved. Thus, the docking score can be calculated with higher accuracy with a smaller memory size.

また、本発明によれば、スコア算出部は、第１構造グリッドデータと前記第２構造グリッドデータの組合せに対してＦＦＴ処理を行うための第１メモリ領域と、前記処理結果の和を求めるための第２メモリ領域とを使用してスコア算出を行う。これにより、上述したように、メモリサイズを小さくでき、また、近似の際により多くの固有値を残して精度を向上できる。 According to the present invention, the score calculation unit obtains the sum of the first memory area for performing FFT processing on the combination of the first structural grid data and the second structural grid data, and the processing result. The second memory area is used to calculate the score. As a result, as described above, the memory size can be reduced, and the accuracy can be improved by leaving more eigenvalues in the approximation.

また、本発明によれば、第１構造グリッドデータと第２構造グリッドデータを相対的に移動したときの各々の相対配置におけるドッキングスコアが算出される。これにより、平行移動による探索を行って複合体構造を予測できる。 Moreover, according to this invention, the docking score in each relative arrangement | positioning when the 1st structure grid data and the 2nd structure grid data are moved relatively is calculated. Thereby, the complex structure can be predicted by performing a search by parallel movement.

また、本発明によれば、第１入力構造および第２入力構造の少なくとも一方がグリッド化前に回転される。本発明の例ではリガンドが回転される。これにより、回転移動による探索を行って複合体構造を予測できる。 Further, according to the present invention, at least one of the first input structure and the second input structure is rotated before being gridded. In the present example, the ligand is rotated. As a result, a complex structure can be predicted by performing a search based on rotational movement.

また、本発明によれば、第１入力構造と第２入力構造の各々の相対配置におけるドッキングスコアが比較されて、複合体構造の候補が選定される。これにより、複合体構造の予測結果として複合体構造の候補を求められる。 Further, according to the present invention, the docking score in the relative arrangement of each of the first input structure and the second input structure is compared, and a complex structure candidate is selected. Thereby, the candidate of a complex structure is calculated | required as a prediction result of a complex structure.

以上に本発明の好適な実施の形態を説明した。しかし、本発明は上述の実施の形態に限定されず、当業者が本発明の範囲内で上述の実施の形態を変形可能なことはもちろんである。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and it goes without saying that those skilled in the art can modify the above-described embodiments within the scope of the present invention.

以上のように、本発明にかかる複合体構造予測装置は、タンパク質等の複合体の構造を予測でき、バイオテクノロジーおよびバイオインフォマティクス等の分野で有用である。 As described above, the complex structure prediction apparatus according to the present invention can predict the structure of a complex such as a protein, and is useful in fields such as biotechnology and bioinformatics.

複合体構造予測に用いられるグリッドデータの例を示す図である。It is a figure which shows the example of the grid data used for complex structure prediction. 複合体構造予測における平行移動による探索処理を示す図である。It is a figure which shows the search process by the parallel movement in complex structure prediction. 複合体構造予測における回転移動による探索処理を示す図である。It is a figure which shows the search process by the rotational movement in complex structure prediction. 従来技術におけるスコア算出のための構成を示すブロック図である。It is a block diagram which shows the structure for the score calculation in a prior art. ペアポテンシャル行列の例を示す図である。It is a figure which shows the example of a pair potential matrix. 従来技術におけるドッキングスコアの計算のための構成を示すブロック図である。It is a block diagram which shows the structure for calculation of a docking score in a prior art. ペアポテンシャル行列の圧縮処理を示す図である。It is a figure which shows the compression process of a pair potential matrix. 本発明の実施の形態に係る複合体構造予測装置を実現するコンピュータの構成を示す図である。It is a figure which shows the structure of the computer which implement | achieves the composite structure prediction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る複合体構造予測装置の構成を示すブロック図である。It is a block diagram which shows the structure of the composite structure prediction apparatus which concerns on embodiment of this invention. スコア算出部の構成を示す図である。It is a figure which shows the structure of a score calculation part. 本実施の形態にてスコア算出処理に必要なメモリサイズを示す図である。It is a figure which shows the memory size required for a score calculation process in this Embodiment. 本実施の形態に係る複合体構造予測装置の動作を示す図である。It is a figure which shows operation | movement of the composite structure prediction apparatus which concerns on this Embodiment. 複合体構造の予測結果の例を示す図である。It is a figure which shows the example of the prediction result of a composite structure.

Explanation of symbols

２１複合体構造予測装置
２３構造入力部
２５ペアポテンシャル行列記憶部
２７ペアポテンシャル行列変形部
２９構造回転部
３１グリッド処理部
３３スコア算出部
３５候補選定部
３７予測結果出力部 DESCRIPTION OF SYMBOLS 21 Complex structure prediction apparatus 23 Structure input part 25 Pair potential matrix memory | storage part 27 Pair potential matrix deformation | transformation part 29 Structure rotation part 31 Grid processing part 33 Score calculation part 35 Candidate selection part 37 Prediction result output part

Claims

A complex structure prediction device that predicts a complex structure when a plurality of three-dimensional structures are docked by calculating a docking score representing an interaction energy according to a relative arrangement of the plurality of three-dimensional structures,
A structure input unit for inputting the first input structure and the second input structure, which are the three-dimensional structures to be docked;
A pair potential matrix storage unit that classifies elements that can form a three-dimensional structure into a plurality of types and stores a pair potential matrix in which interaction energy of an arbitrary pair made from the plurality of types is arranged; and
The pair potential matrix is compressed based on the first input structure and the second input structure, and an arbitrary element made up of an arbitrary element appearing in the first input structure and an arbitrary element appearing in the second input structure A pair potential matrix deformation unit for generating a compressed pair potential matrix in which the interaction energy of the pair is arranged;
Using the first input structure, the second input structure, and the compressed pair potential matrix, the first input structure and the second input structure are transformed to obtain a score for calculating a score including a collection of data for each grid. A grid processing unit for generating 1 structure grid data and 2 structure grid data;
A score calculator for calculating a docking score of the first input structure and the second input structure from the first structure grid data and the second structure grid data;
A complex structure prediction apparatus comprising:

The pair potential matrix transformation unit further performs a singular value decomposition that decomposes the compressed pair potential matrix into a first orthogonal matrix, a diagonal matrix, and a second orthogonal matrix, and a part of the plurality of eigenvalues of the diagonal matrix The complex structure prediction apparatus according to claim 1, wherein an approximation process is performed in which eigenvalues of the number of approximate eigenvalues are kept by setting to zero.

The grid processing unit uses the diagonal matrix approximated to the first orthogonal matrix and the second orthogonal matrix as the compressed pair potential matrix, and uses the first input structure, the first orthogonal matrix, and The first structure grid data is generated from the approximated diagonal matrix, and the second structure grid data is generated from the second input structure and the second orthogonal matrix. Complex structure prediction device.

The score calculation unit includes, for the combination of the first structural grid data and the second structural grid data, the number of times corresponding to the approximate eigenvalue number that is the number of eigenvalues left in the approximation process, FFT processing, and multiplication processing The composite structure prediction apparatus according to claim 2, wherein the sum of the processing results is obtained.

The score calculation unit includes a first memory area for performing FFT processing on a combination of the first structural grid data and the second structural grid data, and a second memory area for obtaining a sum of the processing results. The composite structure prediction apparatus according to claim 4, wherein the score is calculated using the synthesizer.

The said score calculation part calculates the docking score in each relative arrangement | positioning when the said 1st structural grid data and the said 2nd structural grid data are moved relatively. The composite structure prediction apparatus according to 1.

The composite structure prediction apparatus according to claim 1, further comprising a structure rotating unit configured to rotate at least one of the first input structure and the second input structure before being gridded.

It has a candidate selection part which compares the docking score in each relative arrangement | positioning calculated by the said score calculation part, and selects the candidate of the said complex structure, The one in any one of Claims 1-7 characterized by the above-mentioned. Complex structure prediction device.

A composite structure prediction method for predicting a composite structure when a plurality of three-dimensional structures are docked by calculating a docking score representing an interaction energy according to a relative arrangement of the plurality of three-dimensional structures,
Inputting a first input structure and a second input structure, which are the three-dimensional structures to be docked;
Classifying elements that can form a three-dimensional structure into a plurality of types, reading out a pair potential matrix in which interaction energy of an arbitrary pair made from the plurality of types is arranged, is read from the pair potential matrix storage unit,
The pair potential matrix is compressed based on the first input structure and the second input structure, and an arbitrary element made up of an arbitrary element appearing in the first input structure and an arbitrary element appearing in the second input structure Generate a compressed pair potential matrix that arranges the interaction energy of pairs,
Using the first input structure, the second input structure, and the compressed pair potential matrix, the first input structure and the second input structure are transformed to obtain a score for calculating a score including a collection of data for each grid. 1 structure grid data and 2nd structure grid data are generated,
Calculating docking scores of the first input structure and the second input structure from the first structure grid data and the second structure grid data;
A composite structure prediction method characterized by the above.

A complex structure prediction program for causing a computer to execute a process of calculating a docking score representing an interaction energy corresponding to a relative arrangement of a plurality of three-dimensional structures, when a plurality of three-dimensional structures are docked. ,
Obtaining a first input structure and a second input structure which are the plurality of three-dimensional structures to be docked;
Classifying elements that can form a three-dimensional structure into a plurality of types, reading out a pair potential matrix in which interaction energy of an arbitrary pair made from the plurality of types is arranged, is read from the pair potential matrix storage unit,
The pair potential matrix is compressed based on the first input structure and the second input structure, and an arbitrary element made up of an arbitrary element appearing in the first input structure and an arbitrary element appearing in the second input structure Generate a compressed pair potential matrix that arranges the interaction energy of pairs,
Using the first input structure, the second input structure, and the compressed pair potential matrix, the first input structure and the second input structure are transformed to obtain a score for calculating a score including a collection of data for each grid. 1 structure grid data and 2nd structure grid data are generated,
A complex structure prediction program that causes the computer to execute a process of calculating a docking score of the first input structure and the second input structure from the first structure grid data and the second structure grid data.