JP7541709B2

JP7541709B2 - Protein binding site information acquisition device, and operation method and program of protein binding site information acquisition device

Info

Publication number: JP7541709B2
Application number: JP2020048625A
Authority: JP
Inventors: 泰己上田; 桂彦松本; 頌子原田
Original assignee: Cubicstars
Current assignee: Cubicstars
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2024-08-29
Anticipated expiration: 2040-03-19
Also published as: JP2021145604A

Description

本発明は、タンパク質結合部位情報取得装置、タンパク質結合部位情報取得装置の作動方法及びプログラムに関する。 The present invention relates to a protein binding site information acquisition device, a method for operating the protein binding site information acquisition device, and a program.

抗体は、抗原に対する高い特異性と結合能を有するため、基礎研究において免疫染色法、ＥＬＩＳＡ法及びウェスタンブロット法等に利用される。抗体は、病理診断用組織染色抗体及び抗体医薬品として医療でも幅広く活用されている。抗体を使用する際、抗体の抗原認識能力を最大限に発揮するためには、抗原における抗体結合部位（エピトープ）のアミノ酸配列を把握したうえで、科学的な根拠に基づき最適な反応条件を決定することが重要である。 Antibodies have high specificity and binding ability to antigens, and are therefore used in basic research in immunohistochemistry, ELISA, Western blotting, and other methods. Antibodies are also widely used in medicine as tissue staining antibodies for pathological diagnosis and antibody pharmaceuticals. When using antibodies, in order to maximize their antigen recognition ability, it is important to understand the amino acid sequence of the antibody-binding site (epitope) in the antigen and then determine the optimal reaction conditions based on scientific evidence.

エピトープ解析法としては、Ｘ線結晶構造解析及び水素重水素交換質量分析等がある。これらはタンパク構造を保持した十分量の抗原－抗体複合体を得なければならず、時間と費用とを要する。より簡便なエピトープ同定法として、ペプチドアレイを利用したエピトープマッピング解析がある。しかし、エピトープマッピング解析では、構築できるライブラリサイズが約１００～１０００種類程度と限られてしまうため、抗原タンパク質及びそのアミノ酸配列が既知でなければ採用し難い。 Epitope analysis methods include X-ray crystallography and hydrogen-deuterium exchange mass spectrometry. These require obtaining a sufficient amount of antigen-antibody complex that retains the protein structure, which is time-consuming and expensive. A simpler epitope identification method is epitope mapping analysis using peptide arrays. However, in epitope mapping analysis, the library size that can be constructed is limited to about 100 to 1000 types, making it difficult to employ unless the antigen protein and its amino acid sequence are already known.

４００万個以上の市販の抗体が現存する中、上述のエピトープ解析法及びエピトープマッピング解析はこの規模に対応可能なスループットを持ち合わせていない。よって、現状ではほとんどの抗体のエピトープ情報が不明か、タンパク質全長配列又は数十～数百アミノ酸配列までの同定に留まっている。このため、科学的根拠に基づく抗原抗体反応プロトコルが統一化されておらず、同じ抗原に抗体を結合させても、研究室によって染色結果が異なるという問題が生じている。特に病理診断においてこの問題は致命的であり、患者の健康状態を正確に判断できないことが問題視されている。 While there are currently over 4 million commercially available antibodies, the epitope analysis method and epitope mapping analysis described above do not have the throughput to handle this scale. Therefore, at present, the epitope information for most antibodies is unknown, or only the full protein sequence or the sequence of several tens to several hundred amino acids has been identified. As a result, there is no standardized antigen-antibody reaction protocol based on scientific evidence, and even when antibodies are bound to the same antigen, the staining results vary depending on the laboratory. This problem is particularly fatal in pathological diagnosis, and it is problematic that it makes it impossible to accurately determine the health condition of patients.

ペプチドアレイと比較してペプチドセレクション法は、１０^８を超えるライブラリサイズを準備できるため、抗原タンパク質が未知でもエピトープの同定が可能である。ペプチドセレクションとは、ランダム配列を持つＤＮＡライブラリを転写及び翻訳することで構築されたペプチドライブラリから、標的に結合するペプチドだけを獲得し、その配列を同定するシステムである。このシステムの重要な特徴として、獲得するペプチドの遺伝子情報が、細胞、ファージ、リボソーム又はピューロマイシンを介して保存されるという点がある。保存された遺伝情報の転写、翻訳及び標的への結合と回収という一連の流れが繰り返されることにより、標的に強く結合するペプチドの遺伝情報のみに収束する。 Compared to peptide arrays, peptide selection allows for the preparation of library sizes exceeding 10 ⁸ , making it possible to identify epitopes even when antigen proteins are unknown. Peptide selection is a system that acquires only peptides that bind to targets from a peptide library constructed by transcribing and translating a DNA library with random sequences, and identifies their sequences. An important feature of this system is that the genetic information of the peptides acquired is stored via cells, phages, ribosomes, or puromycin. By repeating a series of steps, including transcription, translation, and target binding and recovery of the stored genetic information, the system converges to only the genetic information of peptides that bind strongly to the target.

エピトープ解析法への応用において、ペプチドセレクションは、抗原が完全に未知の場合でも抗体があればエピトープ解析が可能である点、またライブラリサイズの大きさゆえに、得られる結果が単なる配列情報のみならず、収束したアミノ酸配列の量比から、抗原認識にエピトープのどのアミノ酸残基が特に重要か（抗体の認識様式）を知ることができる点で有用である。例えば、非特許文献１には、リボソームディスプレイ法がエピトープのアミノ酸配列の決定に応用できることが記載されている。 In its application to epitope analysis, peptide selection is useful in that epitope analysis is possible if antibodies are available even when the antigen is completely unknown, and because of the large size of the library, the results obtained are not just sequence information, but also that it is possible to determine which amino acid residues in the epitope are particularly important for antigen recognition (antibody recognition mode) from the quantitative ratio of the converged amino acid sequence. For example, Non-Patent Document 1 describes that the ribosome display method can be applied to determining the amino acid sequence of an epitope.

一方、特許文献１には、コンピュータを用いたエピトープ予測方法が開示されている。当該エピトープ予測方法では、学習対象タンパク質と当該学習対象タンパク質から検出されているエピトープとの関係を学習用データとして用いた学習モデルの出力結果に基づいて予測対象であるタンパク質におけるエピトープの候補を取得する。 Meanwhile, Patent Document 1 discloses an epitope prediction method using a computer. In this epitope prediction method, candidates for epitopes in a protein to be predicted are obtained based on the output results of a learning model that uses the relationship between a learning target protein and epitopes detected in the learning target protein as learning data.

特開２０１９－１７９３５６号公報JP 2019-179356 A

ＬＡＲＲＹＣ．ＭＡＴＴＨＥＡＫＩＳ、外２名、「Ａｎｉｎｖｉｔｒｏｐｏｌｙｓｏｍｅｄｉｓｐｌａｙｓｙｓｔｅｍｆｏｒｉｄｅｎｔｉｆｙｉｎｇｌｉｇａｎｄｓｆｒｏｍｖｅｒｙｌａｒｇｅｐｅｐｔｉｄｅｌｉｂｒａｒｉｅｓ．」、Ｐｒｏｃ．Ｎａｔｌ．Ａｃａｄ．Ｓｃｉ．ＵＳＡ、１９９４年、９１、ｐ．９０２２－９０２６LARRY C. MATTHEAKIS and 2 others, "An in vitro polysome display system for identifying ligands from very large peptide libraries.", Proc. Natl. Acad. Sci. USA, 1994, 91, p. 9022-9026

上記の特許文献１に開示されたエピトープ予測方法では、学習モデルの構築に、抗原ペプチドに関する情報としてタンパク質内におけるペプチドの位置、ペプチドに含まれるアミノ酸配列、各アミノ酸の疎水性指標及びαへリックスに関する情報等のスコア、並びにエピトープであるか否かを示す情報が必要となる。上述のように、ほとんどの抗体のエピトープ情報が不明か、タンパク質全長配列又は数十～数百アミノ酸配列までの同定に留まっている現状に鑑みて、学習用データを十分に収集できず、学習モデルによる予測の精度が高いとは言えない。 In the epitope prediction method disclosed in the above Patent Document 1, in order to construct a learning model, information about the antigen peptide is required, such as the position of the peptide in the protein, the amino acid sequence contained in the peptide, the hydrophobicity index of each amino acid and a score of information about the α-helix, as well as information indicating whether it is an epitope. As described above, in light of the current situation in which the epitope information of most antibodies is unknown or only full-length protein sequences or sequences of tens to hundreds of amino acids have been identified, it is not possible to collect sufficient learning data, and it cannot be said that the accuracy of predictions made by the learning model is high.

本発明は、上記実情に鑑みてなされたものであり、抗原における抗体のエピトープのアミノ酸配列に関する情報を高い精度で得ることができるタンパク質結合部位情報取得装置、タンパク質結合部位情報取得装置の作動方法及びプログラムを提供することを目的とする。 The present invention has been made in consideration of the above -mentioned circumstances, and has an object to provide a protein-binding site information acquisition device, and an operating method and program for the protein-binding site information acquisition device, which are capable of obtaining information relating to the amino acid sequence of an antibody epitope in an antigen with high accuracy.

本発明の第１の観点に係るタンパク質結合部位情報取得装置は、
抗原の部分配列と複数のランダムなアミノ酸配列との間の類似度の分布を示す第１の情報と、ランダムな塩基配列を有するＤＮＡライブラリに含まれるＤＮＡから転写及び翻訳を介して前記ＤＮＡに対応付けられて生成するペプチドを前記抗原に結合する抗体に結合させて前記ＤＮＡとともに回収することを繰り返して収束した、前記抗体に結合するペプチドのアミノ酸配列と前記部分配列との間の類似度の分布を示す第２の情報と、の比較に基づいて、前記抗原における前記抗体のエピトープのアミノ酸配列に関する情報を取得する取得部を備える。 The protein binding site information acquisition device according to the first aspect of the present invention comprises:
The apparatus further includes an acquisition unit that acquires information regarding the amino acid sequence of the epitope of the antibody in the antigen based on a comparison between first information indicating a distribution of similarity between a partial sequence of the antigen and a plurality of random amino acid sequences and second information indicating a distribution of similarity between the amino acid sequence of a peptide that binds to the antibody and the partial sequence, the distribution being converged by repeatedly binding peptides generated in correspondence with DNA through transcription and translation from DNA contained in a DNA library having random base sequences to an antibody that binds to the antigen and recovering the peptides together with the DNA.

この場合、前記取得部は、
前記第１の情報が示す前記類似度の分布と前記第２の情報が示す前記類似度の分布との距離を算出し、前記エピトープに対する前記抗体の結合の特異性を評価する、
こととしてもよい。 In this case, the acquisition unit:
calculating a distance between the distribution of similarity indicated by the first information and the distribution of similarity indicated by the second information, and evaluating the binding specificity of the antibody to the epitope ;
This may also be the case.

また、前記取得部は、
前記第１の情報と、前記抗体に結合するペプチドのアミノ酸配列と前記部分配列との間の類似度と、の比較によって選抜したペプチドのアミノ酸配列に基づいて前記エピトープのアミノ酸配列を予測する、
こととしてもよい。 Moreover, the acquisition unit
predicting the amino acid sequence of the epitope based on the amino acid sequence of the peptide selected by comparing the first information with the similarity between the amino acid sequence of the peptide that binds to the antibody and the partial sequence;
This may also be the case.

本発明の第２の観点に係るタンパク質結合部位情報取得装置の作動方法は、
取得部を備えるタンパク質結合部位情報取得装置の作動方法であって、
前記取得部は、
抗原の部分配列と複数のランダムなアミノ酸配列との間の類似度の分布を示す第１の情報と、ランダムな塩基配列を有するＤＮＡライブラリに含まれるＤＮＡから転写及び翻訳を介して前記ＤＮＡに対応付けられて生成するペプチドを前記抗原に結合する抗体に結合させて前記ＤＮＡとともに回収することを繰り返して収束した、前記抗体に結合するペプチドのアミノ酸配列と前記部分配列との間の類似度の分布を示す第２の情報と、の比較に基づいて、前記抗原における前記抗体のエピトープのアミノ酸配列に関する情報を取得する。 A method for operating a protein binding site information acquisition apparatus according to a second aspect of the present invention includes the steps of:
A method for operating a protein binding site information acquisition device including an acquisition unit, comprising:
The acquisition unit is
Information regarding the amino acid sequence of the epitope of the antibody in the antigen is obtained based on a comparison between first information indicating the distribution of similarities between a partial sequence of an antigen and a plurality of random amino acid sequences and second information indicating the distribution of similarities between the amino acid sequences of peptides that bind to the antibody and the partial sequences, the distribution being converged by repeatedly binding peptides generated in correspondence with DNA contained in a DNA library having random base sequences through transcription and translation to an antibody that binds to the antigen and recovering the peptides together with the DNA.

本発明の第３の観点に係るプログラムは、
コンピュータを、
抗原の部分配列と複数のランダムなアミノ酸配列との間の類似度の分布を示す第１の情報を参照する手段、
前記第１の情報と、ランダムな塩基配列を有するＤＮＡライブラリに含まれるＤＮＡから転写及び翻訳を介して前記ＤＮＡに対応付けられて生成するペプチドを前記抗原に結合する抗体に結合させて前記ＤＮＡとともに回収することを繰り返して収束した、前記抗体に結合するペプチドのアミノ酸配列と前記部分配列との間の類似度の分布を示す第２の情報と、を比較する手段、
前記比較に基づいて前記抗原における前記抗体のエピトープのアミノ酸配列に関する情報を取得する手段、として機能させる。 A program according to a third aspect of the present invention comprises:
Computer,
A means for referring to first information indicating a distribution of similarity between a partial sequence of the antigen and a plurality of random amino acid sequences;
a means for comparing the first information with second information that indicates a distribution of similarity between the amino acid sequences of peptides that bind to the antibody and the partial sequences, the second information being converged by repeatedly binding peptides generated in correspondence with the DNA through transcription and translation from DNA contained in a DNA library having random base sequences to an antibody that binds to the antigen and recovering the peptides together with the DNA;
The antibody functions as a means for obtaining information about the amino acid sequence of the epitope of the antibody in the antigen based on the comparison .

本発明によれば、抗原における抗体のエピトープのアミノ酸配列に関する情報を高い精度で得ることができる。 According to the present invention, information regarding the amino acid sequence of an antibody epitope in an antigen can be obtained with high accuracy.

（Ａ）は本発明の実施の形態に係るタンパク質結合部位情報取得装置のハードウエア構成を示すブロック図である。（Ｂ）はタンパク質結合部位情報取得装置の機能を示すブロック図である。1A is a block diagram showing a hardware configuration of a protein-binding site information acquisition device according to an embodiment of the present invention, and FIG. 1B is a block diagram showing functions of the protein-binding site information acquisition device. タンパク質に含まれる部分配列と、複数のランダムなアミノ酸配列との間の類似度の算出を示す図である。FIG. 1 shows the calculation of similarity between a partial sequence contained in a protein and multiple random amino acid sequences. 部分配列に対するランダムなアミノ酸配列の類似度の確率の分布を示す図である。FIG. 1 shows the distribution of probabilities of similarity of a random amino acid sequence to a subsequence. 部分配列に対してランダムなアミノ酸配列が所定の類似度以上になる確率を示す図である。FIG. 13 is a diagram showing the probability that a random amino acid sequence will have a predetermined similarity or higher to a partial sequence. タンパク質に含まれるすべての部分配列と、ランダムなアミノ酸配列との間の類似度の算出を示す図である。FIG. 1 shows the calculation of similarity between all partial sequences contained in a protein and a random amino acid sequence. 図４に示す部分配列に対してランダムなアミノ酸配列が所定の類似度以上になる確率の分布とともに、部分配列に対してペプチドセレクションで収束したアミノ酸配列が所定の類似度以上になる確率の分布を示した図である。This figure shows the distribution of the probability that a random amino acid sequence will have a predetermined similarity or higher to the partial sequence shown in Figure 4, as well as the distribution of the probability that an amino acid sequence converged by peptide selection to the partial sequence will have a predetermined similarity or higher. 確率の分布間の距離を算出するための式を例示する図である。FIG. 13 is a diagram illustrating an example of a formula for calculating the distance between probability distributions. アライメントされた部分配列におけるアミノ酸の位置ごとのアミノ酸の出現頻度を示す図である。FIG. 1 shows the frequency of occurrence of amino acids at each amino acid position in aligned partial sequences. 図１に示す実施の形態に係るタンパク質結合部位情報取得装置による情報取得処理のフローチャートを示す図である。2 is a flowchart showing an information acquisition process performed by the protein binding site information acquisition device according to the embodiment shown in FIG. 1. FIG. （Ａ）はｃ－ｆｏｓのアミノ酸配列を示す図である。（Ｂ）は５種類の抗ｃ－ｆｏｓ抗体に対して収束したペプチドから作成したモチーフを示す図である。(A) shows the amino acid sequence of c-fos, and (B) shows motifs created from peptides that converged against five types of anti-c-fos antibodies. （Ａ）はＮｅｕｒｏｎａｌＮｕｃｌｅｉ（ＮｅｕＮ）のアミノ酸配列を示す図である。（Ｂ）は抗ＮｅｕＮ抗体に対して収束したペプチドから作成したモチーフを示す図である。(A) is a diagram showing the amino acid sequence of Neuronal Nuclei (NeuN), and (B) is a diagram showing motifs created from peptides that converged against anti-NeuN antibodies. （Ａ）はチロシンヒドロキシラーゼ（ＴＨ）のアミノ酸配列を示す図である。（Ｂ）は抗ＴＨ抗体に対して収束したペプチドから作成したモチーフを示す図である。(A) shows the amino acid sequence of tyrosine hydroxylase (TH), and (B) shows motifs made from peptides that converge to anti-TH antibodies. （Ａ）はｃ－ｆｏｓのアミノ酸配列を示す図である。（Ｂ）は抗ｃ－ｆｏｓポリクローナル抗体に対して収束したペプチドから作成したモチーフを示す図である。(A) shows the amino acid sequence of c-fos, and (B) shows motifs made from peptides that converged against anti-c-fos polyclonal antibodies. （Ａ）はドパミントランスポーター（ＤＡＴ）のアミノ酸配列を示す図である。（Ｂ）は抗ＤＡＴポリクローナル抗体に対して収束したペプチドから作成したモチーフを示す図である。(A) shows the amino acid sequence of the dopamine transporter (DAT), and (B) shows motifs made from peptides that converged against anti-DAT polyclonal antibodies. （Ａ）はｃ－ｆｏｓのアミノ酸配列を示す図である。（Ｂ）はエピトープ解析によって各抗体のクローンに関して得られたモチーフ及び変異体ｃ－ｆｏｓタンパク質において置換したアミノ酸の位置を示す図である。(A) shows the amino acid sequence of c-fos, and (B) shows the motifs obtained for each antibody clone by epitope analysis and the positions of the amino acids substituted in the mutant c-fos protein. 野生型及び変異体ｃ－ｆｏｓタンパク質と抗体との相互作用についてＥＬＩＳＡ（Ｅｎｚｙｍｅ－ＬｉｎｋｅｄＩｍｍｕｎｏＳｏｒｂｅｎｔＡｓｓａｙ）法で検討した結果を示す図である。FIG. 1 shows the results of examining the interactions between wild-type and mutant c-fos proteins and antibodies by ELISA (Enzyme-Linked Immunosorbent Assay). 変異体ｃ－ｆｏｓタンパク質において置換したアミノ酸の位置を示す図である。FIG. 1 shows the positions of substituted amino acids in mutant c-fos proteins. ＤＥＣＯＤＥ法で得た分子認識様式の妥当性をＥＬＩＳＡ法で検討した結果を示す図である。FIG. 1 shows the results of examining the validity of the molecular recognition mode obtained by the DECODE method using the ELISA method. 抗ｃ－ｆｏｓ抗体に関して、類似度が高かった上位１００種類のタンパク質の類似度を示す図である。FIG. 1 shows the similarity of the top 100 proteins with the highest similarity to anti-c-fos antibodies. 自己免疫疾患を誘導したマウスの血漿中の抗体群をＤＥＣＯＤＥ法で解析して得られたモチーフを示す図である。FIG. 1 shows motifs obtained by analyzing antibody groups in the plasma of mice with induced autoimmune diseases using the DECODE method. （Ａ）は自己免疫疾患を誘導したマウスの血漿中の抗体群をＤＥＣＯＤＥ法で解析して得られたアミノ酸配列から算出された部分配列のスコアの上位を示す図である。（Ｂ）は自己免疫疾患を誘導していないマウスの血漿中の抗体群をＤＥＣＯＤＥ法で解析して得られたアミノ酸配列から算出された部分配列のスコアの上位を示す図である。(A) is a diagram showing the top scores of partial sequences calculated from amino acid sequences obtained by analyzing, by the DECODE method, antibody groups in the plasma of mice with induced autoimmune diseases. (B) is a diagram showing the top scores of partial sequences calculated from amino acid sequences obtained by analyzing, by the DECODE method, antibody groups in the plasma of mice with no induced autoimmune diseases.

本発明に係る実施の形態について説明する。なお、本発明は下記の実施の形態によって限定されるものではない。 The following describes an embodiment of the present invention. Note that the present invention is not limited to the following embodiment.

（実施の形態）
本実施の形態に係るタンパク質結合部位情報取得装置（以下、単に「情報取得装置」ともいう）１００について説明する。情報取得装置１００は、ペプチドセレクションで得られたデータを解析するための装置である。ペプチドセレクションとは、ランダムな塩基配列を有するＤＮＡライブラリを転写及び翻訳することで構築されたペプチドライブラリから、被験物質に結合するペプチドを獲得し、獲得したペプチドのアミノ酸配列を同定するシステムである。ペプチドセレクションでは、獲得するペプチドの遺伝子情報が所定の手段で保存される。保存された遺伝情報の転写、翻訳、被験物質への結合及び回収という一連の流れが繰り返されることで、被験物質に結合するペプチドをコードする遺伝子情報を収束させることができる。 (Embodiment)
A protein binding site information acquisition device (hereinafter, also simply referred to as "information acquisition device") 100 according to this embodiment will be described. The information acquisition device 100 is a device for analyzing data obtained by peptide selection. Peptide selection is a system for acquiring peptides that bind to a test substance from a peptide library constructed by transcribing and translating a DNA library having a random base sequence, and identifying the amino acid sequence of the acquired peptide. In peptide selection, the genetic information of the acquired peptide is stored by a predetermined means. By repeating a series of steps, including transcription, translation, binding to the test substance, and recovery of the stored genetic information, the genetic information encoding the peptide that binds to the test substance can be converged.

情報取得装置１００は、任意の公知のペプチドセレクションで得られたデータに適用可能である。ペプチドセレクションは、例えば、細胞表面提示法、ファージディスプレイ法、リボソームディスプレイ法及びｍＲＮＡディスプレイ法である。細胞表面提示法は、細胞を介して産生されたペプチドを細胞の表面に提示させることで細胞内部に遺伝情報を保存する。ファージディスプレイ法は、微生物を宿主とするファージの表面にペプチドを提示させてファージＤＮＡに遺伝情報を保存する。無細胞タンパク質合成系を利用したリボソームディスプレイ法及びｍＲＮＡディスプレイ法では、それぞれリボソーム及びピューロマイシンを介して遺伝情報が保存される。 The information acquisition device 100 can be applied to data obtained by any known peptide selection. Examples of peptide selection include cell surface display, phage display, ribosome display, and mRNA display. The cell surface display method stores genetic information inside a cell by displaying peptides produced through the cell on the cell surface. The phage display method stores genetic information in the phage DNA by displaying peptides on the surface of a phage that uses a microorganism as a host. In the ribosome display and mRNA display methods that use a cell-free protein synthesis system, genetic information is stored via ribosomes and puromycin, respectively.

情報取得装置１００による解析に好ましいペプチドセレクションは、ｍＲＮＡディスプレイ法である。ｍＲＮＡディスプレイ法の中でも、ＤＥＣＯＤＥ法（国際公開第２０１８／１６８９９９号）が特に好ましい。ＤＥＣＯＤＥ法では、（ｉ）ＤＮＡから転写反応によって、ＲＮＡ分子を取得する。当該ＤＮＡは、プロモーター領域及びプロモーター領域の下流にペプチドをコードする領域を含み、かつアンチセンス鎖の５’末端側に少なくとも１個の２’－修飾ヌクレオシド誘導体を含む。続いて、（ｉｉ）ＲＮＡの３’末端に、スプリントポリヌクレオチドを用いて、ピューロマイシン等のペプチド受容分子を結合する。そして、（ｉｉｉ）ペプチド受容分子が結合しているＲＮＡを翻訳することによって、ＲＮＡとＲＮＡにコードされているペプチドとがペプチド受容分子を介して連結しているＲＮＡとペプチドとの複合体を合成する。さらに、（ｉｖ）ＲＮＡとペプチドとの複合体から複合体を選抜する。 A peptide selection method that is preferred for analysis by the information acquisition device 100 is the mRNA display method. Among the mRNA display methods, the DECODE method (International Publication No. 2018/168999) is particularly preferred. In the DECODE method, (i) an RNA molecule is obtained from DNA by a transcription reaction. The DNA includes a promoter region and a region that codes for a peptide downstream of the promoter region, and includes at least one 2'-modified nucleoside derivative on the 5' end side of the antisense strand. Next, (ii) a peptide acceptor molecule such as puromycin is bound to the 3' end of the RNA using a splint polynucleotide. Then, (iii) an RNA-peptide complex in which the RNA and the peptide encoded by the RNA are linked via the peptide acceptor molecule is synthesized by translating the RNA to which the peptide acceptor molecule is bound. Furthermore, (iv) a complex is selected from the RNA-peptide complex.

工程（ｉｖ）では、ペプチドを介して被験物質に結合する複合体が選抜される。工程（ｉ）～（ｉｖ）を繰り返すことで、被験物質に結合するペプチドをコードする遺伝子情報、すなわちＤＮＡをＤＮＡライブラリから濃縮できる。 In step (iv), a complex that binds to the test substance via a peptide is selected. By repeating steps (i) to (iv), genetic information that codes for a peptide that binds to the test substance, i.e., DNA, can be enriched from the DNA library.

ペプチドセレクションで得られたＤＮＡの塩基配列が変換されたアミノ酸配列が被験物質に結合するペプチドのアミノ酸配列である。次世代シーケンサー等によって、ペプチドセレクションで得られたＤＮＡに関して、１万～数十億リード数、数百万～数億リード数、数十万～数千万リード数又は数十万～数百万リード数で塩基配列を決定することで、当該塩基配列がコードするアミノ酸配列が収集される。本実施の形態に係る情報取得装置１００は、上述のようにペプチドセレクションで得られた多数のアミノ酸配列を解析する。 The amino acid sequence obtained by converting the base sequence of the DNA obtained by peptide selection is the amino acid sequence of the peptide that binds to the test substance. The amino acid sequence encoded by the base sequence is collected by determining the base sequence of the DNA obtained by peptide selection using a next-generation sequencer or the like with 10,000 to several billion reads, several million to several hundred million reads, several hundred thousand to tens of millions reads, or several hundred thousand to several million reads. The information acquisition device 100 according to this embodiment analyzes a large number of amino acid sequences obtained by peptide selection as described above.

図１（Ａ）に示すように、情報取得装置１００は、記憶部１０、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０、入力装置３０、表示装置４０及びＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）５０が、バス６０で接続された構成を有する。 As shown in FIG. 1A, the information acquisition device 100 has a configuration in which a memory unit 10, a RAM (Random Access Memory) 20, an input device 30, a display device 40, and a CPU (Central Processing Unit) 50 are connected by a bus 60.

記憶部１０は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びフラッシュメモリ等の不揮発性の記憶媒体を備える。記憶部１０は、各種データ及びソフトウェアプログラムの他、基準データ１１、タンパク質のアミノ酸配列データ１２及び情報取得プログラム１３を記憶している。各種データには、アミノ酸間の類似性の程度を示す値を定義したテーブルが含まれる。ソフトウェアプログラムには、マルチプルアライメント用ソフトウェア及びクラスタリング用ソフトウェアが含まれる。 The storage unit 10 includes a non-volatile storage medium such as a ROM (Read Only Memory), a HDD (Hard Disk Drive), and a flash memory. In addition to various data and software programs, the storage unit 10 stores reference data 11, protein amino acid sequence data 12, and an information acquisition program 13. The various data includes a table that defines values indicating the degree of similarity between amino acids. The software programs include software for multiple alignment and software for clustering.

記憶部１０が記憶しているデータには、タンパク質の部分配列と複数のランダムなアミノ酸配列との間の類似度の分布を示す基準データ（第１の情報）１１が含まれる。タンパク質の部分配列とは、タンパク質のアミノ酸配列の一部である。タンパク質の部分配列は、タンパク質のアミノ酸配列が格納されたデータベース、例えばＵｎｉＰｒｏｔ及びＰｒｏｔｅｉｎＤａｔａＢａｎｋ（ＰＤＢ）等から取得できる。アミノ酸配列のデータベースは、各種生物のゲノムＤＮＡの塩基配列をアミノ酸に変換することで取得してもよい。被験物質に結合するペプチドの探索空間を確保するために、より多くのタンパク質が網羅されているデータベースを用いるのが好ましい。 The data stored in the memory unit 10 includes reference data (first information) 11 that indicates the distribution of similarities between partial sequences of a protein and multiple random amino acid sequences. A partial sequence of a protein is a part of the amino acid sequence of a protein. A partial sequence of a protein can be obtained from a database in which the amino acid sequences of proteins are stored, such as UniProt and Protein Data Bank (PDB). The database of amino acid sequences may be obtained by converting the base sequences of genomic DNA of various organisms into amino acids. In order to ensure a search space for peptides that bind to the test substance, it is preferable to use a database that covers a larger number of proteins.

タンパク質の部分配列は、２アミノ酸以上の任意の長さである。部分配列の長さを１２アミノ酸とすると、例えば、部分配列は、タンパク質のＮ末端のアミノ酸から１２個のアミノ酸からなる部分配列、タンパク質のＮ末端から２番目のアミノ酸から１２個のアミノ酸からなる部分配列、のようにタンパク質のＮ末端からＣ末端側に１アミノ酸ずつ移動させて得られる部分配列である。１００個のアミノ酸からなるタンパク質において、１２個のアミノ酸からなる部分配列は８９個となる。 A partial sequence of a protein can be any length of two or more amino acids. If the length of a partial sequence is 12 amino acids, for example, a partial sequence is a partial sequence obtained by moving one amino acid at a time from the N-terminus to the C-terminus of a protein, such as a partial sequence consisting of 12 amino acids from the amino acid at the N-terminus of the protein, or a partial sequence consisting of 12 amino acids from the second amino acid from the N-terminus of the protein. In a protein consisting of 100 amino acids, there are 89 partial sequences consisting of 12 amino acids.

ランダムなアミノ酸配列は、無作為に選んだアミノ酸を含むアミノ酸配列である。被験物質が抗体の場合、エピトープのアミノ酸配列の長さは１２アミノ酸程度である。よって、被験物質が抗体の場合には、ランダムなアミノ酸配列として、１２個又は８個の任意のアミノ酸からなる複数のアミノ酸配列を用いるのが好ましい。アミノ酸配列の長さは１２アミノ酸の場合、ランダムなアミノ酸配列のパターンは２０^１２個である。 A random amino acid sequence is an amino acid sequence that includes randomly selected amino acids. When the test substance is an antibody, the length of the epitope amino acid sequence is about 12 amino acids. Therefore, when the test substance is an antibody, it is preferable to use multiple amino acid sequences consisting of 12 or 8 arbitrary amino acids as the random amino acid sequence. When the length of the amino acid sequence is 12 amino acids, the number of patterns of the random amino acid sequence is 20 ^×12 .

ランダムなアミノ酸配列は無作為に選んだ塩基を並べたランダムな塩基配列を変換した複数のアミノ酸配列であってもよい。アミノ酸配列の長さが１２アミノ酸の場合、上記の転写反応に供されるＤＮＡにおけるペプチドをコードする領域は、３個の任意の塩基“Ｎ”で構成されるトリプレット“ＮＮＮ”が１２回繰り返されている。ランダムな塩基配列は、トリプレットのうち、１又は２個の塩基の選択肢を限定してもよい。この場合、ＤＮＡにおけるペプチドをコードする領域は、例えば、Ｇ（グアニン）及びＴ（チミン）から選ばれる塩基“Ｋ”としてトリプレット“ＮＮＫ”が１２回繰り返された塩基配列となる。 The random amino acid sequence may be a plurality of amino acid sequences obtained by converting a random base sequence in which randomly selected bases are arranged. When the length of the amino acid sequence is 12 amino acids, the region encoding the peptide in the DNA subjected to the above transcription reaction is a triplet "NNN" consisting of three arbitrary bases "N" repeated 12 times. The random base sequence may limit the choice of one or two bases among the triplets. In this case, the region encoding the peptide in the DNA is, for example, a base sequence in which the triplet "NNK" is repeated 12 times with the base "K" selected from G (guanine) and T (thymine).

アミノ酸配列間の類似度は、公知の任意の方法で評価できる。例えば、類似度は、アミノ酸間の類似性の程度を示す値を定義したＢＬＯＳＵＭ、ＰＡＭ及びＷＡＣ等のテーブルに基づいて算出される。テーブルを使用する場合、負の値を０として、進化におけるアミノ酸置換の生じにくさを考慮しなくてもよい。テーブルは、アミノ酸の性質等によって類似度を任意に定義したテーブルであってもよい。 The similarity between amino acid sequences can be evaluated by any known method. For example, the similarity is calculated based on tables such as BLOSUM, PAM, and WAC, which define values indicating the degree of similarity between amino acids. When using a table, negative values can be set to 0, and the difficulty of amino acid substitutions occurring during evolution can be ignored. The table may be a table that arbitrarily defines the similarity according to the properties of amino acids, etc.

ランダムなアミノ酸配列の長さを１２アミノ酸として、基準データの作成方法について例示する。図２に示すように、タンパク質のデータベースのエントリであるタンパク質Ａ（その一部のアミノ酸配列を配列番号１に示す）の一部である部分配列ａ１～ａ５・・・を含む部分配列ａと、ランダムなアミノ酸配列ｒ１～ｒ５（配列番号２～６、なお、アミノ酸配列ｒ２中のＸａａはセレノシステイン（Ｕ）である）・・・を含むアミノ酸配列Ｒとの間の類似度を算出する。類似度の算出では、テーブルを参照し、アミノ酸配列ｒ１を構成する１２個のアミノ酸それぞれについて、当該アミノ酸に対応する位置にある部分配列ａ１のアミノ酸との類似性の値を求める。１２個のアミノ酸に関する類似性の値の和を部分配列ａ１とアミノ酸配列ｒ１の類似度とする。アミノ酸配列ｒ１と同様に、部分配列ａ１に対するアミノ酸配列ｒ２～ｒ５の類似度を求める。 The method of creating the reference data is illustrated by assuming that the length of the random amino acid sequence is 12 amino acids. As shown in FIG. 2, the similarity is calculated between partial sequence a, which includes partial sequences a1 to a5, which are part of protein A (a part of the amino acid sequence is shown in SEQ ID NO: 1), which is an entry in the protein database, and amino acid sequence R, which includes random amino acid sequences r1 to r5 (SEQ ID NO: 2 to 6, where Xaa in amino acid sequence r2 is selenocysteine (U)). In calculating the similarity, a table is referenced, and the value of similarity between each of the 12 amino acids constituting amino acid sequence r1 and the amino acid in partial sequence a1 at the position corresponding to that amino acid is calculated. The sum of the similarity values for the 12 amino acids is regarded as the similarity between partial sequence a1 and amino acid sequence r1. As with amino acid sequence r1, the similarity of amino acid sequences r2 to r5 to partial sequence a1 is calculated.

部分配列ａ１に対するアミノ酸配列Ｒの類似度をヒストグラムとし、全体の和が１になるように規格化すると、図３に示すように部分配列ａ１に対するランダムなアミノ酸配列Ｒの類似度の確率の分布が得られる。図３に示す分布に基づいて所定の類似度以上になる確率を算出すると、図４に示す分布が得られる。 When the similarity of amino acid sequence R to partial sequence a1 is plotted as a histogram and normalized so that the total sum is 1, the distribution of the probability of similarity of random amino acid sequence R to partial sequence a1 is obtained as shown in Figure 3. When the probability of achieving a predetermined similarity or higher is calculated based on the distribution shown in Figure 3, the distribution shown in Figure 4 is obtained.

図５に示すように、タンパク質Ａにおいて部分配列ａ１（配列番号７）からそのＮ末端がＣ末端側に１アミノ酸ずつ移動した部分配列ａ２（配列番号８）、ａ３（配列番号９）、ａ４（配列番号１０）及びａ５（配列番号１１）に対しても同様にアミノ酸配列Ｒの類似度が所定の類似度以上になる確率を求める。タンパク質のデータベースに格納されたすべてのタンパク質に含まれるすべての部分配列ａについて、ランダムなアミノ酸配列Ｒが所定の類似度以上になる確率が類似度の分布を示す基準データとなる。 As shown in FIG. 5, the probability that the similarity of amino acid sequence R will be equal to or greater than a predetermined similarity is also calculated for partial sequences a2 (SEQ ID NO: 8), a3 (SEQ ID NO: 9), a4 (SEQ ID NO: 10), and a5 (SEQ ID NO: 11) in which the N-terminus of partial sequence a1 (SEQ ID NO: 7) in protein A has been shifted by one amino acid toward the C-terminus. For all partial sequences a contained in all proteins stored in the protein database, the probability that a random amino acid sequence R will be equal to or greater than a predetermined similarity becomes reference data indicating the distribution of similarity.

図１（Ａ）に戻って、タンパク質のアミノ酸配列データ１２は、上述のタンパク質のデータベースに格納されているタンパク質のアミノ酸配列を含むデータ又はゲノムＤＮＡ配列データベースの塩基配列を変換したアミノ酸配列を含むデータである。アミノ酸配列は、タンパク質ごとに全長で記憶されていてもよいし、所定のアミノ酸数、例えば１２アミノ酸の長さの部分配列ａとして記憶されていてもよい。アミノ酸配列データ１２には、アミノ酸配列とともに当該アミノ酸配列に関連付けられたタンパク質の情報が含まれる。アミノ酸配列が部分配列ａの場合、アミノ酸配列データ１２は、部分配列ａに関連付けられた当該部分配列ａを含むタンパク質の情報及び当該タンパク質における部分配列ａの位置に関する情報（例えば、タンパク質ＡのＮ末端からｎ番目のアミノ酸が部分配列ａのＮ末端のアミノ酸に該当する場合のｎ）を含む。 Returning to FIG. 1(A), the amino acid sequence data 12 of a protein is data including the amino acid sequence of a protein stored in the above-mentioned protein database, or data including the amino acid sequence obtained by converting the base sequence in the genome DNA sequence database. The amino acid sequence may be stored in full length for each protein, or may be stored as a partial sequence a having a length of a predetermined number of amino acids, for example, 12 amino acids. The amino acid sequence data 12 includes information on the protein associated with the amino acid sequence as well as the amino acid sequence. When the amino acid sequence is partial sequence a, the amino acid sequence data 12 includes information on the protein including partial sequence a associated with partial sequence a, and information on the position of partial sequence a in the protein (for example, n when the nth amino acid from the N-terminus of protein A corresponds to the N-terminal amino acid of partial sequence a).

ＲＡＭ２０はＣＰＵ５０のメインメモリとして機能し、ＣＰＵ５０による情報取得プログラム１３の実行に際し、情報取得プログラム１３がＲＡＭ２０に展開される。ＲＡＭ２０には、入力装置３０から入力されたデータが一時的に記憶される。 The RAM 20 functions as the main memory of the CPU 50, and when the CPU 50 executes the information acquisition program 13, the information acquisition program 13 is deployed in the RAM 20. The RAM 20 temporarily stores data input from the input device 30.

入力装置３０は、使用者が情報取得装置１００にデータを入力するためのハードウエアである。入力装置３０は、使用者によって入力された、ペプチドセレクションで得られたアミノ酸配列ｙ１～ｙ５・・・を含むアミノ酸配列ＹをＣＰＵ５０に入力する。アミノ酸配列Ｙは、転写及び翻訳を介して核酸から核酸に対応付けられて生成するペプチドを被験物質に結合させて核酸とともに回収することを繰り返して収束した、被験物質に結合するペプチドのアミノ酸配列である。被験物質を抗体としたＤＥＣＯＤＥ法の場合、約１０万～２０万のリード数で決定された塩基配列それぞれを変換したアミノ酸配列Ｙが得られる。同一のアミノ酸配列を除外すると、アミノ酸配列Ｙは、例えば約１万～数万種類となる。ＣＰＵ５０は、記憶部１０にペプチドセレクションで得られたアミノ酸配列Ｙを記憶させる。 The input device 30 is hardware for the user to input data to the information acquisition device 100. The input device 30 inputs the amino acid sequence Y, which includes the amino acid sequences y1 to y5, obtained by peptide selection, input by the user to the CPU 50. The amino acid sequence Y is the amino acid sequence of a peptide that binds to the test substance, which is converged by repeatedly binding a peptide generated from a nucleic acid in association with the nucleic acid through transcription and translation to the test substance and recovering it together with the nucleic acid. In the case of the DECODE method using an antibody as the test substance, the amino acid sequence Y is obtained by converting each of the base sequences determined in the number of reads of about 100,000 to 200,000. If identical amino acid sequences are excluded, the number of amino acid sequences Y will be, for example, about 10,000 to tens of thousands. The CPU 50 stores the amino acid sequence Y obtained by peptide selection in the memory unit 10.

表示装置４０は、ＣＰＵ５０によるデータ解析の結果を出力するためのディスプレイである。ＣＰＵ５０は、記憶部１０に記憶された情報取得プログラム１３をＲＡＭ２０に読み出して、情報取得プログラム１３を実行することにより、以下に説明する機能を実現する。 The display device 40 is a display for outputting the results of data analysis by the CPU 50. The CPU 50 reads the information acquisition program 13 stored in the storage unit 10 into the RAM 20 and executes the information acquisition program 13 to realize the functions described below.

図１（Ｂ）は、ＣＰＵ５０が実現する機能を示すブロック図である。情報取得プログラム１３は、ＣＰＵ５０に取得部１及び出力部２としての機能を実現させる。 Figure 1 (B) is a block diagram showing the functions realized by the CPU 50. The information acquisition program 13 causes the CPU 50 to realize the functions of the acquisition unit 1 and the output unit 2.

取得部１は、基準データと、アミノ酸配列Ｙとタンパク質の部分配列ａとの間の類似度の分布を示す解析対象データ（第２の情報）と、の比較に基づいて、タンパク質における被験物質の結合部位に関する情報を取得する。 The acquisition unit 1 acquires information about the binding site of the test substance in the protein based on a comparison between the reference data and the analysis target data (second information) that indicates the distribution of similarity between the amino acid sequence Y and the partial sequence a of the protein.

取得部１は、上述の基準データと同様の方法で解析対象データを得る。取得部１は、図２におけるアミノ酸配列ｒ１～ｒ５をアミノ酸配列ｙ１～ｙ５として、タンパク質Ａに含まれる部分配列ａ１に対するアミノ酸配列ｙ１～ｙ５との間の類似度を算出し、部分配列ａ１に対するアミノ酸配列ｙ１～ｙ５の類似度が所定の類似度以上になる確率を算出する。取得部１は、部分配列ａ２、ａ３、ａ４及びａ５に対しても同様にアミノ酸配列ｙ１～ｙ５の類似度が所定の類似度以上になる確率を求める。 The acquisition unit 1 acquires data to be analyzed in the same manner as the reference data described above. The acquisition unit 1 regards the amino acid sequences r1 to r5 in FIG. 2 as amino acid sequences y1 to y5, calculates the similarity between the amino acid sequences y1 to y5 and the partial sequence a1 contained in protein A, and calculates the probability that the similarity of the amino acid sequences y1 to y5 to the partial sequence a1 will be equal to or greater than a predetermined similarity. The acquisition unit 1 similarly calculates the probability that the similarity of the amino acid sequences y1 to y5 to the partial sequence a1 will be equal to or greater than a predetermined similarity for the partial sequences a2, a3, a4, and a5.

図６は、図４に示す部分配列ａ１に対してランダムなアミノ酸配列Ｒが所定の類似度以上になる確率の分布に、部分配列ａ１に対してアミノ酸配列Ｙが所定の類似度以上になる確率の分布を重ねて表示した図である。タンパク質における被験物質の結合部位に対応する部分配列ａでは、部分配列ａと類似度の高いアミノ酸配列Ｙが多く得られるため、基準データよりも類似度の高い方にまで確率が分布する。取得部１は、基準データと解析対象データの分布間距離をスコアとして算出する。基準データとの距離が大きい、すなわちスコアが高い部分配列ａほど、被験物質が結合しやすいと言える。当該スコアによって、結合部位に対する被験物質の結合の特異性が評価できる。 Figure 6 shows the distribution of the probability that a random amino acid sequence R will have a predetermined similarity or higher to the partial sequence a1 shown in Figure 4, superimposed on the distribution of the probability that an amino acid sequence Y will have a predetermined similarity or higher to the partial sequence a1. In the partial sequence a corresponding to the binding site of the test substance in the protein, many amino acid sequences Y with high similarity to the partial sequence a are obtained, so the probability is distributed to the side with higher similarity than the reference data. The acquisition unit 1 calculates the distance between the distributions of the reference data and the data to be analyzed as a score. It can be said that the test substance is more likely to bind to a partial sequence a that is a larger distance from the reference data, i.e., a higher score. The specificity of the binding of the test substance to the binding site can be evaluated by the score.

分布間の距離は、公知の方法で算出できる。類似度をＸ、Ｐ（ｘ）をアミノ酸配列Ｙが所定の類似度以上になる確率の分布、Ｑ（ｘ）をランダムなアミノ酸配列Ｒが所定の類似度以上になる確率の分布とすると、例えば図７に列挙する式それぞれで、あるいはこれらを組み合わせてスコアを計算する。なお、ここでいう“距離”は必ずしも数学的な距離である必要はない。 The distance between distributions can be calculated using known methods. If the similarity is X, P(x) is the distribution of the probability that amino acid sequence Y will have a certain similarity or higher, and Q(x) is the distribution of the probability that random amino acid sequence R will have a certain similarity or higher, then the score is calculated using, for example, each of the formulas listed in Figure 7, or a combination of these. Note that the "distance" referred to here does not necessarily have to be a mathematical distance.

取得部１は、すべてのタンパク質に含まれるすべての部分配列ａの基準データと解析対象データとを比較し、スコアの高い部分配列ａを取得する。取得する部分配列ａは、最もスコアが高い部分配列ａであってもよいし、スコアの上位から複数個の部分配列ａであってもよい。取得部１は、アミノ酸配列データ１２を参照し、取得した部分配列ａを含むタンパク質の情報及び当該タンパク質における部分配列ａの位置に関する情報等を被験物質の結合部位に関する情報として取得する。 The acquisition unit 1 compares the reference data for all partial sequences a contained in all proteins with the data to be analyzed, and acquires partial sequences a with high scores. The partial sequence a to be acquired may be the partial sequence a with the highest score, or may be multiple partial sequences a with the highest scores. The acquisition unit 1 refers to the amino acid sequence data 12, and acquires information on the protein containing the acquired partial sequence a and information on the position of the partial sequence a in the protein, etc., as information on the binding site of the test substance.

被験物質がモノクローナル抗体の場合、取得部１は、最大のスコアであった部分配列ａを有するタンパク質の情報を取得する。被験物質がポリクローナル抗体の場合、取得部１は、取得した部分配列ａを含むタンパク質の情報とともに、当該タンパク質における複数の部分配列ａの位置に関する情報を取得する。 When the test substance is a monoclonal antibody, the acquisition unit 1 acquires information on the protein having the partial sequence a with the highest score. When the test substance is a polyclonal antibody, the acquisition unit 1 acquires information on the protein containing the acquired partial sequence a, as well as information on the positions of multiple partial sequences a in the protein.

取得部１は、タンパク質における被験物質の結合部位に関する情報として結合部位のアミノ酸配列を予測する。例えば、取得部１は、スコアの上位から複数個の部分配列ａをアミノ酸配列の類似性に基づいてクラスタリングし、クラスターごとにマルチプルアライメントを作成する。取得部１は、アライメントされた部分配列ａの各位置において最も高い収束率を示したアミノ酸を当該位置のアミノ酸とする。図８は、アライメントされた部分配列ａにおけるアミノ酸の位置と当該位置におけるアミノ酸の出現頻度とが対応づけられたテーブルを示す。図８に示すように、取得部１は、アライメントされた部分配列ａの各位置におけるアミノ酸の出現頻度の高いアミノ酸を当該位置のアミノ酸としてアミノ酸配列を予測してもよい。なお、取得部１は、収束率が所定の値よりも低い位置をブランクとしてアミノ酸配列を予測してもよい。 The acquisition unit 1 predicts the amino acid sequence of the binding site as information on the binding site of the test substance in the protein. For example, the acquisition unit 1 clusters multiple partial sequences a from the top of the score based on the similarity of the amino acid sequences, and creates a multiple alignment for each cluster. The acquisition unit 1 sets the amino acid that shows the highest convergence rate at each position of the aligned partial sequence a as the amino acid at that position. FIG. 8 shows a table in which the position of an amino acid in the aligned partial sequence a is associated with the frequency of occurrence of the amino acid at that position. As shown in FIG. 8, the acquisition unit 1 may predict the amino acid sequence by setting the amino acid with a high frequency of occurrence at each position of the aligned partial sequence a as the amino acid at that position. The acquisition unit 1 may also predict the amino acid sequence by setting the position with a convergence rate lower than a predetermined value as a blank.

また、取得部１は、基準データと、アミノ酸配列Ｙとタンパク質の部分配列との間の類似度と、の比較によって抽出したペプチドのアミノ酸配列に基づいて結合部位のアミノ酸配列を予測する。この場合、基準データには、部分配列ａに対するランダムなアミノ酸配列Ｒの類似度の確率の分布（図４）において、確率が所定の値ｋより小さい類似度の範囲で最小の類似度Ｓが含まれる。記憶部１０は、あらかじめ基準データ１１として、タンパク質のデータベースに格納されたすべてのタンパク質に含まれるすべての部分配列ａそれぞれに対応付けられた類似度Ｓを記憶している。 The acquisition unit 1 also predicts the amino acid sequence of the binding site based on the amino acid sequence of the peptide extracted by comparing the reference data with the similarity between the amino acid sequence Y and the partial sequence of the protein. In this case, the reference data includes the minimum similarity S in the range of similarity where the probability is smaller than a predetermined value k in the distribution of the similarity probability of the random amino acid sequence R to the partial sequence a (FIG. 4). The storage unit 10 stores, in advance as the reference data 11, the similarity S associated with each of all partial sequences a contained in all proteins stored in the protein database.

取得部１は、アミノ酸配列Ｙとタンパク質の部分配列ａとの間の類似度を算出する。取得部１は、記憶部１０を参照し、当該部分配列ａに対応付けられた類似度Ｓ以上のアミノ酸配列Ｙを記憶部１０に記憶させる。取得部１は、類似度Ｓ以上のアミノ酸配列Ｙについてマルチプルアライメントを作成し、上述のようにアミノ酸配列を予測する。なお、マルチプルアライメントの前にアミノ酸配列Ｙをクラスタリングして、クラスターごとにマルチプルアライメントを作成してもよい。なお、取得部１は、マルチプルアライメントを行わず、特定の部分配列のみで類似度Ｓ以上又は確率がｋ以下のアミノ酸配列Ｙをクラスタリングしてもよい。 The acquisition unit 1 calculates the similarity between the amino acid sequence Y and a partial sequence a of a protein. The acquisition unit 1 refers to the storage unit 10, and stores the amino acid sequence Y with a similarity of S or more associated with the partial sequence a in the storage unit 10. The acquisition unit 1 creates a multiple alignment for the amino acid sequence Y with a similarity of S or more, and predicts the amino acid sequence as described above. Note that the amino acid sequence Y may be clustered before the multiple alignment, and a multiple alignment may be created for each cluster. Note that the acquisition unit 1 may not perform multiple alignment, and may cluster the amino acid sequence Y with a similarity of S or more or a probability of k or less using only specific partial sequences.

取得部１は、被験物質の結合部位に関する情報を出力部２に入力する。出力部２は、被験物質の結合部位に関する情報を表示装置４０に表示する。 The acquisition unit 1 inputs information about the binding site of the test substance to the output unit 2. The output unit 2 displays the information about the binding site of the test substance on the display device 40.

続いて、情報取得装置１００による情報取得処理を図９に示すフローチャートを参照して説明する。 Next, the information acquisition process performed by the information acquisition device 100 will be described with reference to the flowchart shown in FIG.

取得部１は、ユーザによって解析対象データが入力装置３０を介して入力されるのを待つ（ステップＳ１；Ｎｏ）。解析対象データが入力されると（ステップＳ１；Ｙｅｓ）、取得部１は、記憶部１０を参照し、解析対象データと基準データとを比較してスコアを算出し、スコアの高い部分配列ａを取得する（ステップＳ２）。取得部１は、アミノ酸配列データ１２を参照し、取得した部分配列ａを含むタンパク質の情報及び当該タンパク質における部分配列ａの位置に関する情報を含む被験物質の結合部位に関する情報を取得する（ステップＳ３）。出力部２は、被験物質の結合部位に関する情報を、表示装置４０に表示する（ステップＳ４）。そして、取得部１は情報取得処理を終了する。 The acquisition unit 1 waits for the user to input the analysis target data via the input device 30 (step S1; No). When the analysis target data is input (step S1; Yes), the acquisition unit 1 refers to the memory unit 10, compares the analysis target data with the reference data, calculates a score, and acquires the partial sequence a with the highest score (step S2). The acquisition unit 1 refers to the amino acid sequence data 12 and acquires information about the binding site of the test substance, including information about the protein containing the acquired partial sequence a and information about the position of the partial sequence a in the protein (step S3). The output unit 2 displays the information about the binding site of the test substance on the display device 40 (step S4). The acquisition unit 1 then ends the information acquisition process.

以上詳細に説明したように、本実施の形態に係る情報取得装置１００は、タンパク質の部分配列ａと複数のランダムなアミノ酸配列Ｒとの間の類似度の分布を示す基準データと、ペプチドセレクションで収束した被験物質に結合するペプチドのアミノ酸配列Ｙと部分配列ａとの間の類似度の分布と、の比較によってタンパク質における被験物質の結合部位に関する情報を取得する。これにより、結合部位に対する被験物質の結合の特異性を評価できるため、タンパク質における被験物質の結合部位に関する情報を高い精度で得ることができる。 As described above in detail, the information acquisition device 100 according to this embodiment acquires information about the binding site of the test substance in the protein by comparing reference data showing the distribution of similarity between partial sequence a of the protein and multiple random amino acid sequences R with the distribution of similarity between partial sequence a and amino acid sequence Y of a peptide that binds to the test substance converged upon by peptide selection. This makes it possible to evaluate the specificity of the binding of the test substance to the binding site, thereby making it possible to obtain information about the binding site of the test substance in the protein with high accuracy.

また、情報取得装置１００は、被験物質に結合するペプチドのアミノ酸配列Ｙと部分配列ａとの間の類似度を基準データと比較することで選抜したペプチドのアミノ酸配列に基づいて結合部位のアミノ酸配列を予測することとした。こうすることで、結合部位のアミノ酸配列の予測精度を高めることができる。情報取得装置１００は、図８に例示される、スコアの高い部分配列ａの各位置のアミノ酸の出現頻度のテーブルを用いることで、被験物質の結合部位の特異性を予測することができる。 In addition, the information acquisition device 100 predicts the amino acid sequence of the binding site based on the amino acid sequence of the selected peptide by comparing the similarity between the amino acid sequence Y of the peptide that binds to the test substance and the partial sequence a with reference data. In this way, the prediction accuracy of the amino acid sequence of the binding site can be improved. The information acquisition device 100 can predict the specificity of the binding site of the test substance by using a table of the occurrence frequency of amino acids at each position of the partial sequence a with high scores, as exemplified in FIG. 8.

本実施に形態では、タンパク質の部分配列ａに対してランダムなアミノ酸配列Ｒが所定の類似度以上になる確率を類似度の分布として用いたが、類似度の分布はこれに限らない。タンパク質の部分配列ａと複数のアミノ酸配列Ｒとの間の類似度の分布は、タンパク質の部分配列ａに対してアミノ酸配列Ｒの類似度の平均値、最頻値又は中央値であってもよい。 In this embodiment, the probability that a random amino acid sequence R will have a predetermined similarity or higher to a partial protein sequence a is used as the distribution of similarity, but the distribution of similarity is not limited to this. The distribution of similarity between a partial protein sequence a and multiple amino acid sequences R may be the average, mode, or median of the similarity of amino acid sequence R to partial protein sequence a.

なお、結合部位に関する情報をより正確に得るために、ペプチドセレクションで収束した被験物質に結合するペプチドのアミノ酸配列Ｙと部分配列ａとの間の類似度にノイズ除去処理を加えてもよい。ノイズ除去処理は、公知のものが適用でき、例えば平均フィルタ等である。 In order to obtain more accurate information about the binding site, a noise removal process may be applied to the similarity between the amino acid sequence Y of the peptide that binds to the test substance and the partial sequence a, which is converged upon by peptide selection. The noise removal process may be a known process, such as an average filter.

また、上記の確率の値ｋ及び類似度Ｓは、解析に応じて適宜設定される。アミノ酸配列間の類似度の算出に使用するテーブル、スコアによる部分配列ａの順位の付け方等も、解析対象の生物種等に応じて設定される。タンパク質のデータベースは１つの生物種に限らず、複数の生物種のタンパク質のデータベースを用いてもよい。複数の生物種由来のタンパク質のデータベースを用いることで被験物質としての抗体の種間の交差性を予測することができる。 The probability value k and similarity S are set appropriately depending on the analysis. The table used to calculate the similarity between amino acid sequences, the method of ranking partial sequence a by score, etc. are also set depending on the biological species to be analyzed. The protein database is not limited to one biological species, and a database of proteins from multiple biological species may be used. By using a database of proteins derived from multiple biological species, it is possible to predict the interspecies cross-reactivity of the antibody as the test substance.

また、情報取得装置１００は、血清、血漿、血液、リンパ液及び髄液等のサンプルに含まれる抗体等の成分を被験物質として実施したペプチドセレクションで得られたペプチドのアミノ酸配列Ｙの解析にも適している。例えば、免疫を惹起したヒトの血清に含まれる抗体を被験物質とすることで血清中の複数種の抗体に対する抗原を同定し、さらに抗原における被験物質の結合部位に関する情報を網羅的に収集できる。なお、被験物質は、化合物、アプタマー、核酸、ペプチド及びタンパク質等、タンパク質に結合し得るものであれば特に限定されない。また、情報取得装置１００は、血清、血漿、血液、リンパ液及び髄液等のサンプルに含まれるあらゆる成分を被験物質として実施したペプチドセレクションで得られたペプチドのアミノ酸配列Ｙを解析してもよい。 The information acquisition device 100 is also suitable for analyzing the amino acid sequence Y of a peptide obtained by peptide selection performed using components such as antibodies contained in samples such as serum, plasma, blood, lymph, and cerebrospinal fluid as test substances. For example, by using antibodies contained in the serum of an immunized human as test substances, antigens for multiple types of antibodies in the serum can be identified, and information on the binding sites of the test substances in the antigens can be comprehensively collected. The test substances are not particularly limited as long as they can bind to proteins, such as compounds, aptamers, nucleic acids, peptides, and proteins. The information acquisition device 100 may also analyze the amino acid sequence Y of a peptide obtained by peptide selection performed using any components contained in samples such as serum, plasma, blood, lymph, and cerebrospinal fluid as test substances.

なお、マルチプルアライメント用ソフトウェアとしては、累進法、反復改善法及び動的計画法等を利用した公知の種々のソフトウェアが使用できる。マルチプルアライメント用ソフトウェアは、例えば、ＣｌｕｓｔａｌＸ、ＣｌｕｓｔａｌＷ、ＭＵＳＣＬＥ、Ｔ－Ｃｏｆｆｅｅ、ＰａｒａｌｌｅｌＰＲＲＮ、ＭｕｌｔＡｌｉｎ、ＭＳＡ、Ｍａｔｃｈ－Ｂｏｘ、ＤＩＡＬＩＧＮ及びＡｌｉＢｅｅ等である。クラスタリング用ソフトウェアについても最短距離法、最長距離法、群平均法、最小分散法、重心法、重み付き平均法、メジアン法及びＫ－ｍｅａｎｓ法等の公知の方法を使用したソフトウェアが使用できる。 As software for multiple alignment, various known software using progressive methods, iterative improvement methods, dynamic programming, etc. can be used. Examples of software for multiple alignment include Clustal X, Clustal W, MUSCLE, T-Coffee, Parallel PRRN, MultAlin, MSA, Match-Box, DIALIGN, and AliBee. As for clustering software, software using known methods such as the shortest distance method, the longest distance method, the group average method, the minimum variance method, the center of gravity method, the weighted average method, the median method, and the K-means method can be used.

なお、上述の基準データ１１、タンパク質のアミノ酸配列データ１２、情報取得プログラム１３及びその他のソフトウェアプログラムは、ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、光磁気ディスク（Ｍａｇｎｅｔｏ－ＯｐｔｉｃａｌＤｉｓｃ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ、メモリカード及びＨＤＤ等のコンピュータ読み取り可能な記録媒体に格納して配布することが可能である。そして、情報取得プログラム１３及びその他のソフトウェアプログラムを特定の又は汎用のコンピュータにインストールすることによって、当該コンピュータを情報取得装置１００として機能させることが可能である。また、基準データ１１、タンパク質のアミノ酸配列データ１２、情報取得プログラム１３及びその他のソフトウェアプログラムをインターネット上の他のサーバが有する記憶装置に格納しておき、当該サーバから基準データ１１、タンパク質のアミノ酸配列データ１２、情報取得プログラム１３及びその他のソフトウェアプログラムがダウンロードされるようにしてもよい。 The above-mentioned reference data 11, protein amino acid sequence data 12, information acquisition program 13, and other software programs can be stored and distributed on computer-readable recording media such as a CD-ROM (Compact Disc Read Only Memory), a DVD (Digital Versatile Disc), a magneto-optical disc (Magneto-Optical Disc), a USB (Universal Serial Bus) memory, a memory card, and a HDD. By installing the information acquisition program 13 and other software programs in a specific or general-purpose computer, the computer can function as the information acquisition device 100. In addition, the reference data 11, the protein amino acid sequence data 12, the information acquisition program 13, and other software programs may be stored in a storage device owned by another server on the Internet, and the reference data 11, the protein amino acid sequence data 12, the information acquisition program 13, and other software programs may be downloaded from that server.

以下の実施例により、本発明をさらに具体的に説明するが、本発明は当該実施例によって限定されるものではない。 The present invention will be explained in more detail with reference to the following examples, but the present invention is not limited to these examples.

（実施例１：ＤＥＣＯＤＥ法によるエピトープのモチーフの予測）
（ライブラリの構築）
次のようにＤＥＣＯＤＥ法を行った。抗体の抗原認識は、約５アミノ酸と言われており、直鎖のアミノ酸配列の場合、１０アミノ酸程度に収まることが多いため、１２アミノ酸がランダム化されるように、テンプレートＤＮＡライブラリを作成した。なお、コドンをＮＮＫ（Ｇ／Ｔ）とランダム化し、ＤＮＡテンプレートではランダムシーケンス中にＳＴＯＰコドンがＵＡＧ（Ａｍｂｅｒ）のみとなるよう設計した。セレクションの１ラウンド目におけるライブラリサイズが１．５×１０^１３となるよう、５００μＬスケールのＰＣＲｍｉｘｔｕｒｅ中に、０．０５μＭのテンプレートＤＮＡライブラリが含まれるよう調整した。テンプレートＤＮＡの塩基配列は、CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATG(NNK)nTGCGGCAGCGGCAGCGGCAGCTACTTTGATCCGCCGACCで、ｎ＝１２とした。なお、ＮはＡ、Ｔ、Ｇ及びＣのいずれかであって、ＫはＴ又はＧである。ｎ＝１でＫがＴの場合のテンプレートＤＮＡの塩基配列を配列番号１２に示す。 (Example 1: Prediction of epitope motifs using the DECODE method)
(Building a Library)
The DECODE method was performed as follows. Antigen recognition by an antibody is said to be about 5 amino acids, and in the case of a linear amino acid sequence, it is often within about 10 amino acids, so a template DNA library was created so that 12 amino acids were randomized. The codons were randomized with NNK (G/T), and the DNA template was designed so that the STOP codon in the random sequence was only UAG (Amber). The library size in the first round of selection was 1.5 x 10 ¹³ , and 0.05 μM of the template DNA library was included in a 500 μL scale PCR mixture. The base sequence of the template DNA was CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATG (NNK) nTGCGGCAGCGGCAGCGGCAGCTACTTTGATCCGCCGACC, with n = 12. N is any of A, T, G, and C, and K is T or G. The base sequence of the template DNA when n=1 and K is T is shown in SEQ ID NO:12.

本セレクションシステムが安定し機能しているかどうかを検討するために、固定配列を用いてセレクションを行い、ペプチド回収効率を確認することが求められる。そこで、抗ＦＬＡＧ抗体特異的に結合する固定配列ペプチドｍｃ１’をポジティブコントロールとして用い、本セレクションシステムが機能しているかどうかラウンドごとに確認した。ＤＥＣＯＤＥ法では抗ＦＬＡＧ抗体に対してｍｃ１’をスクリーニングする場合、回収されたｃＤＮＡ－ペプチド量をｑＰＣＲで測定すると、ＣＴ値は安定して約１０となった。テンプレートｍｃ１’の塩基配列は、CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATGAAGTACTCCCCAACCGACTGCAAGAAGGACTACAAGGACGACGACGACAAGTGCGGCAGCGGCAGCGGCAGCTAGGACGGGGGGCGGAAA（配列番号１３）である。 To examine whether this selection system is stable and functional, it is necessary to perform selection using a fixed sequence and confirm the peptide recovery efficiency. Therefore, a fixed sequence peptide mc1' that specifically binds to anti-FLAG antibody was used as a positive control to confirm whether this selection system is functioning after each round. When mc1' is screened against anti-FLAG antibody in the DECODE method, the CT value was stable at about 10 when the amount of recovered cDNA-peptide was measured by qPCR. The base sequence of template mc1' is CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATGAAGTACTCCCCAACCGACTGCAAGAAGGACTACAAGGACGACGACGACAAGTGCGGCAGCGGCAGCGGCAGCTAGGACGGGGGGCGGAAA (SEQ ID NO: 13).

（テンプレートＤＮＡの増幅）
表１に示す５００μＬスケールのＰＣＲｍｉｘｔｕｒｅを調製し、テンプレートＤＮＡライブラリを増幅させた。調製したＰＣＲｍｉｘｔｕｒｅをサーマルサイクラーにて９５℃で３分間インキュベートした後、９５℃（１０秒間）、５８℃（１０秒間）、７５℃（３０秒間）の温度変化を４サイクル繰り返すことでテンプレートＤＮＡを増幅した。なお、フォワードプライマー（Ｐ１）の塩基配列は、CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATG（配列番号１４）である。リバースプライマー（Ｐ２（抗原）ＯＭｅ）の塩基配列は、ggTCGGCGGATCAAAGTAG（配列番号１５）である。 (Amplification of template DNA)
A 500 μL scale PCR mixture shown in Table 1 was prepared, and the template DNA library was amplified. The prepared PCR mixture was incubated at 95 ° C for 3 minutes in a thermal cycler, and then the template DNA was amplified by repeating four cycles of temperature changes of 95 ° C (10 seconds), 58 ° C (10 seconds), and 75 ° C (30 seconds). The base sequence of the forward primer (P1) is CCTAATACGACTCACTATAGGGTTAACTTTAAGAAGGAGATATACATATG (SEQ ID NO: 14). The base sequence of the reverse primer (P2 (antigen) OMe) is ggTCGGCGGATCAAAGTAG (SEQ ID NO: 15).

（テンプレートＤＮＡライブラリの転写及びＰｕ－ＤＮＡの連結）
５００μＬのＴｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅ用バッファーと５００μＬのテンプレートＤＮＡとを混合し、１０００μＬスケールのＴｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅを調製し、増幅したＤＮＡライブラリを５０ｍＵ／ｕＬＴ７ＲＮＡポリメラーゼ（５μＬ）で転写した。Ｔｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅ用バッファー（ＴＣｍｉｘ）の組成は、終濃度４０ｍＭＨＥＰＥＳ－ＫＯＨ（ｐＨ７．６）、２０ｍＭＭｇＣｌ_２、２ｍＭＳｐｅｒｍｉｄｉｎｅ、５ｍＭＤＴＴ、２．５ｍＭＮＴＰｓである。Ｔｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅを３７℃で４０分間、転写反応させた後、７２℃で５分間放置し、Ｔ７ＲＮＡポリメラーゼを失活させた。得られた転写産物を７Ｍ尿素を含む１０％アクリルアミドゲルで電気泳動（１８０Ｖ、４０分間）し、ｍＲＮＡライブラリが産生されたことを確認した。 (Transcription of template DNA library and ligation of Pu-DNA)
500 μL of the transcription mixture buffer and 500 μL of the template DNA were mixed to prepare a 1000 μL scale transcription mixture, and the amplified DNA library was transcribed with 50 mU/uL T7 RNA polymerase (5 μL). The composition of the transcription mixture buffer (TC mix) was 40 mM HEPES-KOH (pH 7.6), 20 mM MgCl ₂ , 2 mM spermidine, 5 mM DTT, and 2.5 mM NTPs at final concentrations. The transcription mixture was subjected to a transcription reaction at 37° C. for 40 minutes, and then left at 72° C. for 5 minutes to inactivate the T7 RNA polymerase. The resulting transcription products were electrophoresed (180 V, 40 minutes) on a 10% acrylamide gel containing 7 M urea to confirm that an mRNA library had been produced.

続いて、終濃度５０ｍＭＴｒｉｓ－ＨＣｌ（ｐＨ７．５）、１０ｍＭＭｇＣｌ_２、１０ｍＭＤＴＴ及び１ｍＭＡＴＰのバッファー条件下で、Ｔｒａｎｓｃｒｉｐｔｉｏｎｐｒｏｄｕｃｔを５μＭＰｕ－ＤＮＡ（5’-[PHO]CTCCCGCCCCCCGTCC[SpC18]₅CC[Puromycin]、５’末端からスペーサーまでの塩基配列を配列番号１６に示す）、５μＭスプリントＤＮＡ（5’-GGGCGGGAGGGTCGGCGGATCAA（配列番号１７））と混合し、５００μＬスケールのＬｉｇａｔｉｏｎｍｉｘｔｕｒｅとした。Ｌｉｇａｔｉｏｎｍｉｘｔｕｒｅを９５℃で１分間温めた後、７５℃で３０秒間放置し、一定勾配１℃／１５秒間で２５℃まで温度を下げてｍＲＮＡ、Ｐｕ－ＤＮＡ、ＳｐｌｉｎｔＤＮＡの三者をアニーリングした。そこへ、３５ＵのＴ４ＤＮＡｌｉｇａｓｅを加え、３７℃で１時間、連結反応を促進させた後、４℃で放置した。得られた転写産物を７Ｍ尿素を含む１０％アクリルアミドゲルで電気泳動（１８０Ｖ、４０分間）し、ｍＲＮＡライブラリがＰｕ－ＤＮＡと連結したことを確認した。得られたＰｕ－ＤＮＡ連結ｍＲＮＡライブラリを、ＲＮＡ精製試薬キットＡｇｅｎｃｏｕｒｔＡＭＰｕｒｅ（商標）ＸＰで精製し、濃度を決定した。 Next, under buffer conditions of final concentrations of 50 mM Tris-HCl (pH 7.5), ₁₀ mM MgCl2, 10 mM DTT and 1 mM ATP, the transcription product was mixed with 5 μM Pu-DNA (5'-[PHO]CTCCCGCCCCCCGTCC[SpC18] ₅ CC[Puromycin], the base sequence from the 5' end to the spacer is shown in sequence number 16) and 5 μM splint DNA (5'-GGGCGGGAGGGTCGGCGGATCAA (sequence number 17)) to prepare a 500 μL scale ligation mixture. The ligation mixture was heated at 95°C for 1 minute, then left at 75°C for 30 seconds, and the temperature was lowered to 25°C at a constant gradient of 1°C/15 seconds to anneal the three components of mRNA, Pu-DNA, and Splint DNA. 35 U of T4 DNA ligase was added thereto, and the ligation reaction was promoted at 37°C for 1 hour, and then left at 4°C. The resulting transcription product was electrophoresed (180V, 40 minutes) on a 10% acrylamide gel containing 7M urea, and it was confirmed that the mRNA library was ligated with Pu-DNA. The resulting Pu-DNA-ligated mRNA library was purified with the RNA purification reagent kit Agencourt AMPure (trademark) XP, and the concentration was determined.

（カスタムＰＵＲＥｓｙｓｔｅｍｕによる無細胞翻訳）
Ｐｕ－ＤＮＡ連結ｍＲＮＡライブラリを無細胞翻訳系（ＰＵＲＥｓｙｓｔｅｍ）により翻訳し、ペプチドライブラリを獲得した。２．４μＬの０．６μＭｌｉｇａｔｅｄｓａｍｐｌｅ、０．５μＬのＳｏｌｕｔｉｏｎＢ、６μＬのＳｏｌｕｔｉｏｎＡ及び３μＬのＳｔｏｃｋｂｕｆｆｅｒを加えて１１．９μＬのＰＵＲＥｍｉｘｔｕｒｅを調製した。ＰＵＲＥｍｉｘｔｕｒｅを３７℃で１時間反応させた。 (Cell-free translation using custom PURE system)
The Pu-DNA ligated mRNA library was translated by a cell-free translation system (PURE system) to obtain a peptide library. 2.4 μL of 0.6 μM ligated sample, 0.5 μL of Solution B, 6 μL of Solution A, and 3 μL of stock buffer were added to prepare 11.9 μL of PURE mixture. The PURE mixture was reacted at 37° C. for 1 hour.

ＳｏｌｕｔｉｏｎＢの組成を表２に示す。なお、Ｓｔｏｃｋｂｕｆｆｅｒの組成は、５０ｍＭＨＥＰＥＳ－ＫＯＨ（ｐＨ７．６）、１００ｍＭＫＣｌ、１０ｍＭＭｇＣｌ及び３０％グリセロールである。
The composition of Solution B is shown in Table 2. The composition of the stock buffer is 50 mM HEPES-KOH (pH 7.6), 100 mM KCl, 10 mM MgCl, and 30% glycerol.

ｆａｃｔｏｒｍｉｘの組成を表３に示す。
The composition of the factor mix is shown in Table 3.

ＳｏｌｕｔｉｏｎＡの組成を表４に示す。
The composition of Solution A is shown in Table 4.

ＮＴＰｃｒａｔｉｎｅｐｈｏｓｐｈａｔｅｍｉｘｔｕｒｅの組成を表５に示す。
The composition of the NTP crate phosphate mixture is shown in Table 5.

ＰＵＲＥｂｕｆｆｅｒの組成を表６に示す。
The composition of the PURE buffer is shown in Table 6.

（Ｔａｇ抗体固定化ビーズの調製）
Ｔａｇ抗体はセレクションの１周目及び３周目はｐｒｏｔｅｉｎＧビーズに、２周目はｐｒｏｔｅｉｎＡビーズに固定化させた。これらのビーズは使用前に５００μＬのｗａｓｈｂｕｆｆｅｒ（５０ｍＭＴｒｉｓ－ＨＣｌ、ｐＨ８．０、５００ｍＭＮａＣｌ、１％Ｔｒｉｔｏｎ及び０．０１％Ｔｗｅｅｎ２０）で洗浄した。ビーズ２．５μＬに対して、ＩｇＧ抗体を１μＬ加え、３０分間振とうしてビーズとＩｇＧ抗体とを結合させた。 (Preparation of Tag antibody immobilized beads)
The Tag antibody was immobilized on protein G beads in the first and third rounds of selection, and on protein A beads in the second round. These beads were washed with 500 μL of wash buffer (50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 1% Triton and 0.01% Tween 20) before use. 1 μL of IgG antibody was added to 2.5 μL of beads, and the mixture was shaken for 30 minutes to bind the beads and the IgG antibody.

（Ｔａｇ抗体固定化ビーズへのペプチドライブラリの結合反応）
Ｔａｇ抗体固定化ビーズに対して、翻訳後産物１１．９μＬとｂｉｎｄｉｎｇｂｕｆｆｅｒ（５０ｍＭＴｒｉｓ－ＨＣｌ、ｐＨ８．０及び１０ｍＭＥＤＴＡ）２５μＬを加え、３０分間振とうして、ペプチドライブラリをＴａｇ抗体固定化ビーズに結合させた（ポジティブセレクション）。上清を除いてビーズを回収し、ｗａｓｈｂｕｆｆｅｒで１０回洗ってＩｇＧに特異的に結合するペプチドライブラリを得た。 (Binding reaction of peptide library to Tag antibody-immobilized beads)
11.9 μL of the post-translation product and 25 μL of binding buffer (50 mM Tris-HCl, pH 8.0 and 10 mM EDTA) were added to the Tag antibody-immobilized beads, and the mixture was shaken for 30 minutes to bind the peptide library to the Tag antibody-immobilized beads (positive selection). The supernatant was removed, and the beads were collected and washed 10 times with wash buffer to obtain a peptide library that specifically binds to IgG.

（逆転写）
ｐｒｏｔｅｉｎＧビーズ又はｐｒｏｔｅｉｎＡビーズ上に存在するｍＲＮＡをＰｒｏｔｏＳｃｒｉｐｔＩＩＲＴａｓｅにより逆転写しｃＤＮＡとした。最終的に４４．５μＬスケールの逆転写反応となるように、ビーズに対し、４０μＬのＲＴｍｉｘ、４．２５μＬのＲＴ（－）（５０ｍＭｔｒｉｓ－ＨＣｌ（ｐＨ８．０）及び７５ｍＭＫＣｌ）及び０．２５μＬのＰｒｏｔｏＳｃｒｉｐｔＩＩを混合し、３７℃で４０分間、逆転写反応をさせた。ＲＴｍｉｘは、０．２ｍＭｄＮＴＰｓ、１０ｍＭＤＴＴ及び０．２μＭＲＴ－Ｐｒｉｍｅｒ（Ｐ２＿ｖｅｒ２、GGTCGGCGGATCAAAGTAGCTGCCGCTGCCGCTGCCGCA（配列番号１８））を含むＰｒｏｔｏＳｃｒｉｐｔｂｕｆｆｅｒである。 (Reverse Transcription)
The mRNA present on the protein G beads or protein A beads was reverse transcribed to cDNA using Proto Script II RTase. To obtain a final reverse transcription reaction on a scale of 44.5 μL, 40 μL of RT mix, 4.25 μL of RT(-) (50 mM tris-HCl (pH 8.0) and 75 mM KCl) and 0.25 μL of ProtoScript II were mixed with the beads, and the reverse transcription reaction was carried out at 37° C. for 40 minutes. The RT mix is a ProtoScript buffer containing 0.2 mM dNTPs, 10 mM DTT, and 0.2 μM RT-Primer (P2_ver2, GGTCGGCGGATCAAAGTAGCTGCCGCTGCCGCTGCCGCA (SEQ ID NO: 18)).

（溶出）
リン酸バッファー１０μＬにて、Ｔａｇ抗体を９５℃で３分間保持し、ペプチドライブラリを抽出した。溶出後上清を回収し、２０μＬのｕｌｔｒａｐｕｒｅｗａｔｅｒでビーズを洗って、その上清をさらに回収した。 (Elution)
The Tag antibody was incubated in 10 μL of phosphate buffer at 95° C. for 3 minutes to extract the peptide library. After elution, the supernatant was collected, the beads were washed with 20 μL of ultrapure water, and the supernatant was further collected.

（ｑＰＣＲによる回収されたｃＤＮＡ量の定量）
回収されたペプチドに連結したｃＤＮＡを、ｑＰＣＲで定量し、次のラウンドにおけるＰＣＲ増幅の最適なサイクルを決定した。表７に示すｑＰＣＲｍｉｘｔｕｒｅを３８４ウェルの各ウェルに７μＬ分注し、０．５μＬのｃＤＮＡを各ウェルに添加した。プレートを９５℃で３分間インキュベートした後、９５℃で１分間の後、９５℃（１０秒間）及び６０℃（３０秒間）の２ステップを４０サイクル繰り返して反応させ、ｃＤＮＡを増幅した。リバースプライマー（Ｐ２（抗原））の塩基配列はGGTCGGCGGATCAAAGTAGCTGCCGCTGCCGCTGCCGCA（配列番号１９）である。 Quantification of the amount of recovered cDNA by qPCR
The cDNA linked to the recovered peptide was quantified by qPCR to determine the optimal cycle for PCR amplification in the next round. 7 μL of the qPCR mixture shown in Table 7 was dispensed into each well of a 384-well plate, and 0.5 μL of cDNA was added to each well. The plate was incubated at 95° C. for 3 minutes, and then incubated at 95° C. for 1 minute, followed by 40 cycles of two steps at 95° C. (10 seconds) and 60° C. (30 seconds) to amplify the cDNA. The base sequence of the reverse primer (P2 (antigen)) is GGTCGGCGGATCAAAGTAGCTGCCGCTGCCGCTGCCGCA (SEQ ID NO: 19).

（回収されたＤＮＡのＰＣＲによる増幅）
表８に示すＰＣＲｍｉｘｔｕｒｅに増幅したテンプレートＤＮＡ２０μＬを添加し、ＰｈｕｓｉｏｎＤＮＡｐｏｌｙｍｅｒａｓｅによりｃＤＮＡライブラリを増幅させた。調整したＰＣＲｍｉｘｔｕｒｅをサーマルサイクラーにおいて９５℃で３分間インキュベートした後、９５℃、５８℃、７５℃の温度変化をｑＰＣＲで決定したサイクル繰り返すことでテンプレートＤＮＡを増幅した。１％アガロースゲルで泳動して増幅を確認することで、最適サイクル数（Ｎ）を決定した。決定したＰＣＲ条件で増幅したテンプレートＤＮＡライブラリを１％アガロースゲルで泳動して確認した。十分な増幅を確認後、ＡｇｅｎｃｏｕｒｔＡＭＰｕｒｅ（商標）ＸＰで精製した。 (PCR Amplification of Recovered DNA)
20 μL of the amplified template DNA was added to the PCR mixture shown in Table 8, and the cDNA library was amplified by Phusion DNA polymerase. The adjusted PCR mixture was incubated at 95 ° C for 3 minutes in a thermal cycler, and then the template DNA was amplified by repeating temperature changes of 95 ° C, 58 ° C, and 75 ° C for a cycle determined by qPCR. The optimal number of cycles (N) was determined by confirming the amplification by electrophoresis on a 1% agarose gel. The template DNA library amplified under the determined PCR conditions was confirmed by electrophoresis on a 1% agarose gel. After sufficient amplification was confirmed, it was purified with Agencourt AMPure (trademark) XP.

（２ラウンド目の転写）
１０μＬスケールのＴｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅを調製し、増幅したＤＮＡライブラリ０．１μＭを５０ｍＵ／ｕＬＴ７ＲＮＡｐｏｌｙｍｅｒａｓｅで転写した。Ｔｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅには５ｍＭＮＴＰｓ、５μＭＤＴＴ、２０μＭＭｇＣｌ_２を加えた。Ｔｒａｎｓｃｒｉｐｔｉｏｎｍｉｘｔｕｒｅを３７℃で１時間、転写反応させた後、７５℃で５分間放置し、Ｔ７ＲＮＡｐｏｌｙｍｅｒａｓｅを失活させた。 (Second round transcription)
A 10 μL scale transcription mixture was prepared, and 0.1 μM of the amplified DNA library was transcribed with 50 mU/uL T7 RNA polymerase. 5 mM NTPs, 5 μM DTT, and 20 μM MgCl ₂ were added to the transcription mixture. The transcription mixture was subjected to a transcription reaction at 37° C. for 1 hour, and then left at 75° C. for 5 minutes to inactivate the T7 RNA polymerase.

（２ラウンド目のｍＲＮＡとＰｕ－ＤＮＡ連結）
Ｔｒａｎｓｃｒｉｐｔｉｏｎ産物５μＭに１ｍＭＡＴＰｓ、１０μＭＰｕ－ＤＮＡ、１０μＭスプリントＤＮＡを混合し、１×ｌｉｇａｔｉｏｎｂｕｆｆｅｒで８μＬスケールのＬｉｇａｔｉｏｎｍｉｘｔｕｒｅを調製した。Ｌｉｇａｔｉｏｎｍｉｘｔｕｒｅを９５℃で１分間温めた後、一定勾配で１５分間かけて２５℃まで温度を下げ、ｍＲＮＡ、Ｐｕ－ＤＮＡ及びＳｐｌｉｔＤＮＡをアニーリングした。Ｔ４ｌｉｇａｓｅを加え、３７℃で１時間反応させた。 (Second round of mRNA and Pu-DNA ligation)
5 μM of the transcription product was mixed with 1 mM ATPs, 10 μM Pu-DNA, and 10 μM splint DNA to prepare a ligation mixture on an 8 μL scale with 1× ligation buffer. The ligation mixture was heated at 95°C for 1 minute, and then cooled to 25°C at a constant gradient over 15 minutes to anneal the mRNA, Pu-DNA, and split DNA. T4 ligase was added and reacted at 37°C for 1 hour.

研究用抗体１６１種類（モノクローナル抗体が１４４種類、ポリクローナル抗体が１７種類）それぞれを上記のビーズに固定化して、本実施例に係るＤＥＣＯＤＥ法を実行した。ＤＥＣＯＤＥ法でスクリーニングしたペプチドのｃＤＮＡについて、次世代シーケンサーであるＨｉＳｅｑ３０００（Ｉｌｌｕｍｉｎａ社製）で１抗体につき１００万リードほど塩基配列を決定した。得られた塩基配列をアミノ酸に変換し、プログラミングソフトｊａｖａを使用しアミノ酸配列についてクラスタリングを行った。置換スコア関数にはＢＬＯＳＵＭ６２マトリクスを用いた。ただし、本研究では進化的なアミノ酸置換の生じにくさを考慮しないため、負の値は０とした行列を使った。得られたクラスターについて、シーケンスアラインメント用ソフトウェアであるＣｌｕｓｔａｌＸでアラインメントを作製したのち、Ｗｅｂｌｏｇｏでモチーフを作成した。 161 types of research antibodies (144 types of monoclonal antibodies and 17 types of polyclonal antibodies) were immobilized on the above-mentioned beads, and the DECODE method according to this embodiment was carried out. For the cDNA of the peptides screened by the DECODE method, the base sequence of about 1 million reads per antibody was determined using a next-generation sequencer, HiSeq3000 (manufactured by Illumina). The obtained base sequence was converted to amino acids, and clustering was performed on the amino acid sequence using the programming software Java. The BLOSUM62 matrix was used as the substitution score function. However, in this study, the difficulty of evolutionary amino acid substitution was not taken into consideration, so a matrix with negative values set to 0 was used. For the obtained clusters, an alignment was created using Clustal X, a software for sequence alignment, and then motifs were created using Weblogo.

（結果）
各ラウンドで回収したｃＤＮＡ－ペプチド複合体量をｑＰＣＲで定量したところ、ほとんどの抗体で３ラウンド目に収束がみられた。 (result)
The amount of cDNA-peptide complex recovered in each round was quantified by qPCR, and convergence was observed in the third round for most antibodies.

図１０（Ａ）はｃ－ｆｏｓのアミノ酸配列（配列番号２０）を示す。モノクローナル抗体に関して、抗ｃ－ｆｏｓ抗体５種類に対して収束したペプチドの中で、最も収束率の高かったモチーフを図１０（Ｂ）に示す。各抗ｃ－ｆｏｓ抗体について得られたモチーフは、図１０（Ａ）に示すようにｃ－ｆｏｓの一部に一致していた。 Figure 10 (A) shows the amino acid sequence of c-fos (SEQ ID NO: 20). Figure 10 (B) shows the motif with the highest convergence rate among the peptides that converged with five types of anti-c-fos antibodies for monoclonal antibodies. The motif obtained for each anti-c-fos antibody matched a part of c-fos as shown in Figure 10 (A).

図１１（Ａ）はＮｅｕＮのアミノ酸配列（配列番号２１）を示す。図１１（Ｂ）は抗ＮｅｕＮ抗体に対して収束したペプチドから作成したモチーフを示す。抗ＮｅｕＮ抗体について得られたモチーフは、図１１（Ａ）に示すようにＮｅｕＮの一部に一致していた。図１２（Ａ）はＴＨのアミノ酸配列（配列番号２２）を示す図である。図１２（Ｂ）は抗ＴＨ抗体に対して収束したペプチドから作成したモチーフを示す。抗ＴＨ抗体について得られたモチーフは、図１２（Ａ）に示すようにＴＨの一部に一致していた。 Figure 11(A) shows the amino acid sequence of NeuN (SEQ ID NO:21). Figure 11(B) shows the motif created from peptides that converged with the anti-NeuN antibody. The motif obtained for the anti-NeuN antibody matched a portion of NeuN as shown in Figure 11(A). Figure 12(A) shows the amino acid sequence of TH (SEQ ID NO:22). Figure 12(B) shows the motif created from peptides that converged with the anti-TH antibody. The motif obtained for the anti-TH antibody matched a portion of TH as shown in Figure 12(A).

上記のように本実施例で得られたいずれのモチーフも、標的タンパク質上の一部の配列と一致した。アミノ酸配列の量比から、抗体の抗原認識に、エピトープのどのアミノ酸残基が特に重要かを予測することができた。また、図１０（Ｂ）に示すように、ｃ－ｆｏｓを標的とする抗体の同一のクローンＣ１について独立のＤＥＣＯＤＥ法で再現性良く同じモチーフを獲得することができた。 As described above, all of the motifs obtained in this example matched a portion of the sequence on the target protein. From the quantitative ratio of the amino acid sequence, it was possible to predict which amino acid residues in the epitope were particularly important for the antibody's antigen recognition. In addition, as shown in Figure 10 (B), the same motif was reproducibly obtained by an independent DECODE method for the same clone C1 of the antibody targeting c-fos.

ポリクローナル抗体に関して、抗ｃ－ｆｏｓ抗体（シグマ社製）及び抗ＤＡＴ抗体（シグマ社製）に対して収束したペプチドについて作成したモチーフをそれぞれ図１３及び図１４に示す。図１３（Ｂ）に示すように、得られたモチーフは主に４種類のクラスターに分類された。いずれのモチーフも、図１３（Ａ）に示すｃ－ｆｏｓタンパク質配列上の一部と一致した。また、２つの独立した解析において、同じ抗ｃ－ｆｏｓ抗体について再現性良く同じモチーフが獲得された。 For polyclonal antibodies, motifs created for peptides that converged with anti-c-fos antibody (Sigma) and anti-DAT antibody (Sigma) are shown in Figures 13 and 14, respectively. As shown in Figure 13 (B), the obtained motifs were mainly classified into four types of clusters. All motifs matched parts of the c-fos protein sequence shown in Figure 13 (A). Furthermore, the same motifs were obtained reproducibly for the same anti-c-fos antibody in two independent analyses.

図１４（Ｂ）に示すように抗ＤＡＴ抗体でも特異的なモチーフが得られた。得られたモチーフは主に２種類のクラスターに分類された。いずれのモチーフも、ＤＡＴタンパク質配列（配列番号２３）上の一部と一致した（図１４（Ａ）参照）。抗ＤＡＴ抗体においても２つの独立した解析において、同じ抗ＤＡＴ抗体で再現性良く同じモチーフが獲得された。 As shown in Figure 14 (B), specific motifs were also obtained with anti-DAT antibodies. The obtained motifs were mainly classified into two types of clusters. Both motifs matched parts of the DAT protein sequence (SEQ ID NO: 23) (see Figure 14 (A)). In two independent analyses, the same motifs were obtained with high reproducibility using the same anti-DAT antibodies.

（実施例２：ＤＥＣＯＤＥ法の精度に関するＥＬＩＳＡ法による評価）
ＤＥＣＯＤＥ法で得たモチーフが、実際に抗体が認識するエピトープであるかを、ＥＬＩＳＡ法で実証した。以下ではｃ－Ｆｏｓタンパク質を認識する、クローンが異なる５種類の抗ｃ－ｆｏｓ抗体（モノクローナル抗体Ｃ１、Ｃ２、Ｃ４、Ｃ５及びポリクローナル抗体Ｃ７）を利用して抗原抗体反応を検証した。 (Example 2: Evaluation of the accuracy of the DECODE method by the ELISA method)
We verified by ELISA whether the motif obtained by the DECODE method was actually an epitope recognized by the antibody. In the following, we verified the antigen-antibody reaction using five different anti-c-fos antibodies (monoclonal antibodies C1, C2, C4, C5 and polyclonal antibody C7) that recognize the c-Fos protein and have different clones.

野生型ｃ－ｆｏｓタンパク質（全長）の遺伝子がクローニングされたｐＭＵ２プラスミドに、各種変異プライマーをＰｒｉｍｅＳｔａｒｍａｘで導入した。作製したｃ－ｆｏｓタンパク質変異体及び野生型ｃ－ｆｏｓタンパク質のベクターを、ＨＥＫ２９３Ｔへ形質転換して発現させた。発現後、細胞を破砕しライセートを回収した。 Various mutation primers were introduced into the pMU2 plasmid into which the gene for wild-type c-fos protein (full length) had been cloned, using PrimeStar max. The c-fos protein mutants and wild-type c-fos protein vectors thus prepared were transformed into HEK293T cells for expression. After expression, the cells were disrupted and the lysate was collected.

各種変異体ｃ－ｆｏｓタンパク質及び野生型ｃ－ｆｏｓタンパク質を発現させたＨＥＫ２９３Ｔのライセートを３８４プレートに固定化した。１２．５μＬ／ｗｅｌｌで１時間振とうさせ、一晩４℃で保存した。次に、ｂｌｏｃｋｉｎｇｏｎｅを１／５希釈して１２０μＬ／ｗｅｌｌで満たし、一時間室温で静置した。ＴＰＢＳ（０．１％Ｔｗｅｅｎ２０ＰＢＳ）で３回洗ったのち、系列希釈した各抗ｃ－ｆｏｓ抗体（Ｃ１、Ｃ２、Ｃ４、Ｃ５、Ｃ７）を１次抗体としてｃ－ｆｏｓタンパク質に結合させた。室温で１時間振とうして結合反応を行い、ＴＰＢＳで３回洗浄した。その後、１／１０００希釈したＨｏｒｓｅＲａｄｉｓｈＰｅｒｏｘｉｄａｓｅ（ＨＲＰ）標識二次抗体（ｍｏｕｓｅ、ｒａｂｂｉｔ、ｇｏａｔ）を加え、室温で１時間振とうして結合反応を行った。二次抗体との結合反応後、ＴＰＢＳで１２回洗浄した。ＥＬＩＳＡＰＯＤ基質ＴＭＢ（３，３’，５，５’－テトラメチルベンジジン）発色基質溶液を２５μＬ添加し、十分発色させ、０．１ＭＨ_２ＳＯ_４を５０μＬ添加して反応を停止させた。マイクロプレートリーダーで吸光度（４５０ｎｍ）を測定した。 Lysates of HEK293T expressing various mutant c-fos proteins and wild-type c-fos proteins were immobilized on a 384-well plate. The plate was shaken for 1 hour at 12.5 μL/well and stored at 4°C overnight. Blocking one was then diluted 1/5 and filled at 120 μL/well, and allowed to stand at room temperature for 1 hour. After washing three times with TPBS (0.1% Tween 20 PBS), serially diluted anti-c-fos antibodies (C1, C2, C4, C5, C7) were bound to the c-fos protein as primary antibodies. The plate was shaken for 1 hour at room temperature to carry out the binding reaction, and then washed three times with TPBS. Then, Horse Radish Peroxidase (HRP)-labeled secondary antibodies (mouse, rabbit, goat) diluted 1/1000 were added and the mixture was shaken at room temperature for 1 hour to carry out a binding reaction. After the binding reaction with the secondary antibody, the mixture was washed 12 times with TPBS. ELISA POD substrate TMB (3,3',5,5'-tetramethylbenzidine) color-developing substrate solution was added in an amount of 25 μL, sufficient color development was achieved, and 50 μL of 0.1 M H ₂ SO ₄ was added to stop the reaction. The absorbance (450 nm) was measured using a microplate reader.

ＨＥＫ２９３Ｔのライセートごとｃ－ｆｏｓタンパク質を固定化したため、ｃ－ｆｏｓタンパク質を発現させていないＨＥＫ２９３Ｔのライセートにおける吸光度をブランクとして差し引いた。プレートに固定化したＨＥＫ２９３Ｔのライセート中の各変異体及び野生型ｃ－ｆｏｓタンパク質の発現量の違いを補正するために、抗ｃ－ｆｏｓ抗体（Ｃ２、Ｃ４、Ｃ５、Ｃ７）による抗原抗体反応の吸光度は抗ｃ－ｆｏｓ抗体（Ｃ１）の飽和時の吸光度で、抗ｃ－ｆｏｓ抗体（Ｃ１）の吸光度は抗ｃ－ｆｏｓ抗体（Ｃ２）の飽和時の吸光度で割った値を正規化した吸光度とした。各抗体濃度に対する正規化した吸光度をプロットし、飽和曲線を作製した。下記のミカエリスメンテン式に対して、最小二乗法により測定値を近似し、正規化した吸光度の最大値（Ａｂｓ．ｍａｘ）とＫｍ値とを算出した。
正規化した吸光度＝Ａｂｓ．ｍａｘ×ｎＭ／（ｎＭ＋Ｋｍ） Since the c-fos protein was immobilized with each HEK293T lysate, the absorbance in the lysate of HEK293T in which the c-fos protein was not expressed was subtracted as a blank. In order to correct the difference in the expression amount of each mutant and wild-type c-fos protein in the lysate of HEK293T immobilized on the plate, the absorbance of the antigen-antibody reaction by the anti-c-fos antibody (C2, C4, C5, C7) was the absorbance at saturation of the anti-c-fos antibody (C1), and the absorbance of the anti-c-fos antibody (C1) was divided by the absorbance at saturation of the anti-c-fos antibody (C2) to obtain a normalized absorbance. The normalized absorbance against each antibody concentration was plotted to prepare a saturation curve. The measured values were approximated by the least squares method against the Michaelis-Menten equation below, and the maximum normalized absorbance (Abs.max) and Km value were calculated.
Normalized absorbance = Abs.max x nM/(nM + Km)

（結果）
図１５（Ａ）はｃ－ｆｏｓのアミノ酸配列を示す。エピトープ解析によって各抗体のクローンに関して得られたモチーフ及び変異体ｃ－ｆｏｓタンパク質において置換したアミノ酸の位置を図１５（Ｂ）に示す。変異体ｃ－ｆｏｓタンパク質では、抗原認識に重要と予測されるアミノ酸を置換している。抗原抗体相互作用について、野生型と変異体とを比較したＥＬＩＳＡの結果を図１６に示す。モノクローナル抗体Ｃ１、Ｃ２及びＣ４はそれぞれに対応する変異体への結合が野生型と比較して顕著に低下した。モノクローナル抗体Ｃ５及びポリクローナル抗体Ｃ７は野生型に対して弱く結合したものの、それぞれのエピトープ変異体に対してはまったく結合がみられなかった。これらの結果はＤＥＣＯＤＥ法で得たペプチドが、実際の抗体認識部位であったことを示す。 (result)
FIG. 15(A) shows the amino acid sequence of c-fos. The motifs obtained for each antibody clone by epitope analysis and the positions of the amino acids substituted in the mutant c-fos protein are shown in FIG. 15(B). In the mutant c-fos protein, amino acids predicted to be important for antigen recognition are substituted. The results of ELISA comparing the wild type and mutants for antigen-antibody interactions are shown in FIG. 16. The binding of monoclonal antibodies C1, C2, and C4 to their corresponding mutants was significantly reduced compared to the wild type. Monoclonal antibody C5 and polyclonal antibody C7 bound weakly to the wild type, but did not bind to the respective epitope mutants at all. These results indicate that the peptides obtained by the DECODE method were the actual antibody recognition sites.

抗ｃ－ｆｏｓ抗体（Ｃ１）に対してエピトープ解析で得た分子認識様式（抗体が認識する際のエピトープのアミノ酸の重要度）について、実際の抗原抗体反応が一致するかどうかをＥＬＩＳＡ法で検証した。ｃ－ｆｏｓ抗体のエピトープとして同定された図１０（Ｂ）に示すエピトープ１について、変異体ｃ－ｆｏｓタンパク質において置換したアミノ酸の位置を図１７に示す。抗原抗体相互作用について、野生型と変異体とを比較したＥＬＩＳＡの結果を図１８に示す。変異体Ｄ２７１Ａ、Ｆ２７２Ａ、Ｌ２７３Ａ、Ｆ２７４Ａは野生型と比較して抗ｃ－ｆｏｓ抗体の結合が顕著に低下した。アミノ酸Ｆ２７２、Ｆ２７４を芳香族アミノ酸（Ｙ、Ｗ）に置換した場合、抗ｃ－ｆｏｓ抗体の結合が回復した。変異体Ｐ２７５Ａ、Ａ２７６Ｇ、Ｒ２７９Ａ及びＰ２８０Ａに関しては、野生型と比較して抗ｃ－ｆｏｓ抗体の結合がやや低下したが、変異体Ｓ２７７Ａ、Ｓ２７８Ａ及びＳ２８１Ａの結合に変化は見られなかった。これらの結果より、ＤＥＣＯＤＥ法で得た分子認識様式が、実際の抗原抗体反応と一致することが示された。 The molecular recognition mode (importance of amino acids in the epitope when the antibody recognizes) obtained by epitope analysis for anti-c-fos antibody (C1) was verified by ELISA to see whether it coincided with the actual antigen-antibody reaction. For epitope 1 shown in Figure 10 (B), which was identified as the epitope of the c-fos antibody, the position of the amino acid substituted in the mutant c-fos protein is shown in Figure 17. The results of ELISA comparing the wild type and the mutants in terms of antigen-antibody interaction are shown in Figure 18. Mutants D271A, F272A, L273A, and F274A showed a significant decrease in binding of anti-c-fos antibody compared to the wild type. When amino acids F272 and F274 were substituted with aromatic amino acids (Y, W), binding of anti-c-fos antibody was restored. For the mutants P275A, A276G, R279A, and P280A, binding of the anti-c-fos antibody was slightly reduced compared to the wild type, but no change was observed in binding for the mutants S277A, S278A, and S281A. These results demonstrated that the molecular recognition pattern obtained by the DECODE method is consistent with the actual antigen-antibody reaction.

（実施例３：抗ｃ－ｆｏｓ抗体の交差反応性の評価）
クローンが異なる８種類の抗ｃ－Ｆｏｓ抗体（モノクローナル抗体Ｃ１、Ｃ２、Ｃ４、Ｃ５及びＣ８、並びにポリクローナル抗体Ｃ３、Ｃ６及びＣ７に関して交差反応性を評価した。ＤＥＣＯＤＥ法において収束したペプチドのアミノ酸配列と、約２００００種類のヒトタンパク質との間の類似度を算出した。類似度の算出には、負の値は０とした置換スコア関数ＢＬＯＳＵＭ６２マトリクスを使用した。 (Example 3: Evaluation of cross-reactivity of anti-c-fos antibodies)
Cross-reactivity was evaluated for eight anti-c-Fos antibodies with different clones (monoclonal antibodies C1, C2, C4, C5, and C8, and polyclonal antibodies C3, C6, and C7). The similarity between the amino acid sequence of the peptide converged by the DECODE method and approximately 20,000 types of human proteins was calculated. The similarity was calculated using the substitution score function BLOSUM62 matrix, in which negative values are set to 0.

（結果）
図１９は、各抗体について類似度が高かった上位１００種類のタンパク質の類似度を示す。図１９における矢印は、標的タンパク質、すなわちｃ－ｆｏｓを示す。モノクローナル抗体Ｃ１及びＣ２並びにポリクローナル抗体Ｃ３及びＣ７ではｃ－ｆｏｓに対する類似度が最大であった。モノクローナル抗体Ｃ４、及びＣ５のｃ－ｆｏｓに対する類似度が１位ではなく、特異性が低いことが予想された。モノクローナル抗体Ｃ８及びポリクローナル抗体Ｃ６のｃ－ｆｏｓに対する類似度は１００位以下であった。Ｃ１に関しては独立したエピトープ解析で再現性良くｃ－ｆｏｓに対する類似度が最大となった。 (result)
FIG. 19 shows the similarity of the top 100 proteins with the highest similarity for each antibody. The arrow in FIG. 19 indicates the target protein, i.e., c-fos. The monoclonal antibodies C1 and C2 and the polyclonal antibodies C3 and C7 had the highest similarity to c-fos. The similarity of the monoclonal antibodies C4 and C5 to c-fos was not ranked first, and it was expected that they have low specificity. The similarity of the monoclonal antibody C8 and the polyclonal antibody C6 to c-fos was ranked 100th or lower. As for C1, the similarity to c-fos was the highest with good reproducibility in an independent epitope analysis.

（実施例４：実験的自己免疫性脳脊髄炎（ＥＡＥ）モデルマウスの血漿のＤＥＣＯＤＥ解析）
１０週齢のＣ５７Ｂ６マウスに、Ｍｙｅｌｉｎ－ｏｌｉｇｏｄｅｎｄｒｏｃｙｔｅｇｌｙｃｏｐｒｏｔｅｉｎ（ＭＯＧ）の一部であるＭＯＧ３５－５５ペプチド（MEVGWYRSPFSRVVHLYRNGK（配列番号２４））を、１０週齢のＣ５７Ｂ６マウスに完全フロイントアジュバントを用いて免疫した。免疫後２０～２２日に全血を回収し、等量のＰＢＳと混和し、フィコールを用いて血漿成分を分離した。血漿１００μＬを１００μＬのＰｒｏｔｅｉｎＧ磁気ビーズ（ｄｙｎａｂｅａｄｓ）に４℃で１時間結合させた。これを５００μＬの洗浄バッファー（５０ｍＭＴｒｉｓ－ＨＣｌ（ｐＨ７．５）、３００ｍＭＮａＣｌ及び０．１％ＴｒｉｔｏｎＸ－１００）で５回洗浄し、ＥＡＥマウス抗体が固定化された磁気ビーズを得た。ｎ＝８とした上述のテンプレートＤＮＡを含むテンプレートＤＮＡライブラリに対して、当該磁気ビーズを用いて実施例１と同様にＤＥＣＯＤＥ法を５ラウンド行い、ＨｉＳｅｑ２５００を用いてシングルリード８０ベースでシーケンシングを行った。 (Example 4: DECODE analysis of plasma from experimental autoimmune encephalomyelitis (EAE) model mice)
Ten-week-old C57B6 mice were immunized with MOG35-55 peptide (MEVGWYRSPFSRVVHLYRNGK (SEQ ID NO: 24)), a part of myelin-oligodendrocyte glycoprotein (MOG), using complete Freund's adjuvant. Whole blood was collected 20 to 22 days after immunization, mixed with an equal amount of PBS, and plasma components were separated using Ficoll. 100 μL of plasma was bound to 100 μL of Protein G magnetic beads (dynabeads) at 4° C. for 1 hour. These were washed five times with 500 μL of washing buffer (50 mM Tris-HCl (pH 7.5), 300 mM NaCl and 0.1% Triton X-100) to obtain magnetic beads on which EAE mouse antibodies were immobilized. Five rounds of the DECODE method were performed using the magnetic beads in the same manner as in Example 1 for the template DNA library containing the above-mentioned template DNA (n=8), and sequencing was performed with a single read of 80 bases using HiSeq2500.

得られた塩基配列を変換したアミノ酸配列についてマルチプルアライメントを行った。また、マウスタンパク質の部分配列について、基準データと得られたアミノ酸配列群とを比較してスコアを算出した。 Multiple alignment was performed on the amino acid sequences converted from the obtained base sequences. In addition, a score was calculated by comparing the obtained amino acid sequence group with the reference data for the partial sequences of mouse proteins.

（結果）
マルチプルアライメントの結果に基づいて、図２０に示すように、ＭＯＧ３５－５５ペプチドの一部と相同性の高いモチーフが得られた。図２１（Ａ）に示すように、規格化されたスコアの上位１４０位と３９９位にＭＯＧ３５の部分配列が検出された。一方、図２１（Ｂ）に示すように、ＭＯＧ３５－５５ペプチドを免疫していないマウス血漿からはＭＯＧの部分配列は検出されなかった。 (result)
Based on the results of multiple alignment, a motif highly homologous to a portion of the MOG35-55 peptide was obtained, as shown in Figure 20. As shown in Figure 21(A), partial sequences of MOG35 were detected in the top 140 and 399 of the normalized scores. On the other hand, as shown in Figure 21(B), no partial MOG sequence was detected in the plasma of mice not immunized with the MOG35-55 peptide.

本発明は、本発明の広義の精神と範囲を逸脱することなく、様々な実施の形態及び変形が可能とされるものである。また、上述した実施の形態は、本発明を説明するためのものであり、本発明の範囲を限定するものではない。すなわち、本発明の範囲は、実施の形態ではなく、特許請求の範囲によって示される。そして、特許請求の範囲内及びそれと同等な発明の意義の範囲内で施される様々な変形が、本発明の範囲内とみなされる。 The present invention allows for various embodiments and modifications without departing from the broad spirit and scope of the present invention. Furthermore, the above-described embodiments are intended to explain the present invention and do not limit the scope of the present invention. In other words, the scope of the present invention is indicated by the claims, not the embodiments. Furthermore, various modifications made within the scope of the claims and within the scope of the meaning of the invention equivalent thereto are considered to be within the scope of the present invention.

本発明は、タンパク質における被験物質の結合部位に関する情報、特に抗体のエピトープの予測に好適である。 The present invention is suitable for predicting information regarding the binding site of a test substance in a protein, particularly the epitope of an antibody.

１取得部、２出力部、１０記憶部、１１基準データ、１２タンパク質のアミノ酸配列データ、１３情報取得プログラム、２０ＲＡＭ、３０入力装置、４０表示装置、５０ＣＰＵ、６０バス、１００タンパク質結合部位情報取得装置 1 Acquisition unit, 2 Output unit, 10 Memory unit, 11 Reference data, 12 Protein amino acid sequence data, 13 Information acquisition program, 20 RAM, 30 Input device, 40 Display device, 50 CPU, 60 Bus, 100 Protein binding site information acquisition device

Claims

an acquisition unit that acquires information about an amino acid sequence of an epitope of the antibody in the antigen based on a comparison between first information indicating a distribution of similarity between a partial sequence of the antigen and a plurality of random amino acid sequences and second information indicating a distribution of similarity between an amino acid sequence of a peptide that binds to the antibody and the partial sequence, the distribution being converged by repeatedly binding a peptide generated in correspondence with the DNA through transcription and translation from DNA contained in a DNA library having random base sequences to an antibody that binds to the antigen and recovering the peptide together with the DNA;
Protein binding site information acquisition device.

The acquisition unit is
calculating a distance between the distribution of similarity indicated by the first information and the distribution of similarity indicated by the second information, and evaluating the binding specificity of the antibody to the epitope ;
The protein binding site information acquisition device according to claim 1 .

The acquisition unit is
predicting the amino acid sequence of the epitope based on the amino acid sequence of the peptide selected by comparing the first information with the similarity between the amino acid sequence of the peptide that binds to the antibody and the partial sequence;
The protein binding site information acquisition device according to claim 1 or 2.

A method for operating a protein binding site information acquisition device including an acquisition unit, comprising:
The acquisition unit is
obtaining information about the amino acid sequence of the epitope of the antibody in the antigen based on a comparison between first information indicating a distribution of similarities between a partial sequence of the antigen and a plurality of random amino acid sequences and second information indicating a distribution of similarities between the amino acid sequences of peptides that bind to the antibody and the partial sequences, the distribution of similarities being converged by repeatedly binding peptides generated in correspondence with DNA contained in a DNA library having random base sequences through transcription and translation to an antibody that binds to the antigen and recovering the peptides together with the DNA ;
A method for operating an apparatus for acquiring information on protein binding sites.

Computer,
A means for referring to first information indicating a distribution of similarity between a partial sequence of the antigen and a plurality of random amino acid sequences;
a means for comparing the first information with second information that indicates a distribution of similarity between the amino acid sequences of peptides that bind to the antibody and the partial sequences, the second information being converged by repeatedly binding peptides generated in correspondence with DNA through transcription and translation from DNA contained in a DNA library having random base sequences to an antibody that binds to the antigen and recovering the peptides together with the DNA;
and obtaining information about the amino acid sequence of the epitope of the antibody in the antigen based on the comparison .
program.