JP6857597B2

JP6857597B2 - Biomarker search method for urinary biotransformers

Info

Publication number: JP6857597B2
Application number: JP2017236306A
Authority: JP
Inventors: 坂入　実; 実坂入
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2021-04-14
Anticipated expiration: 2037-12-08
Also published as: JP2019105456A

Description

本発明は、尿中代謝物マーカーを探索するための方法、システム及びプログラムに関する。具体的には、本発明は、特定の疾患又は状態、特にがんと関連した尿中代謝物マーカーを探索するための方法、システム及びプログラムに関する。 The present invention relates to methods, systems and programs for searching for urinary biotransformer markers. Specifically, the present invention relates to methods, systems and programs for searching for urinary biotransformation markers associated with a particular disease or condition, particularly cancer.

従来のがんの検査法として、例えば患者由来の代謝物を使用したがんの検査方法（特許文献１）の報告があり、これは、血液中の代謝物の変動を解析することによって、がん検査を迅速かつ容易に行うというものである。具体的な工程は、(1)質量分析（MS）解析にて患者由来の代謝物のマススペクトルデータを取得する工程、(2)患者由来の代謝物、外科的切除後の患者由来の代謝物のマススペクトルデータを比較してマススペクトルの変動を検出する工程、(3)変動の検出より、患者由来の試料ががん患者由来の試料であるかを判定する工程、からなる。 As a conventional cancer testing method, for example, there is a report of a cancer testing method using patient-derived biotransforms (Patent Document 1). The test is quick and easy. Specific steps are (1) acquisition of mass spectrum data of patient-derived metabolites by mass spectrometry (MS) analysis, (2) patient-derived metabolites, and patient-derived metabolites after surgical resection. It consists of a step of detecting the fluctuation of the mass spectrum by comparing the mass spectrum data of the above, and (3) a step of determining whether the sample derived from the patient is a sample derived from a cancer patient by detecting the fluctuation.

特開2010-266386号公報Japanese Unexamined Patent Publication No. 2010-266386

従来法の特徴は、がんの検査法が1つのバイオマーカーによるものではなく、マルチマーカーによる検査を主体としたものであり、患者の病態をより詳しく反映しようとしている。マススペクトルデータの解析法では、得られたマススペクトルデータ中における特定のマススペクトルの消失、出現時間、保持時間、ピーク強度の上昇若しくは減少を解析することによってがんの判定を行う。この場合の解析モデルは、OPLS-DA等の多変量解析によって求めるとしている。 The feature of the conventional method is that the cancer testing method is not based on one biomarker but mainly on multi-marker testing, and is trying to reflect the pathological condition of the patient in more detail. In the mass spectrum data analysis method, cancer is determined by analyzing the disappearance, appearance time, retention time, and increase or decrease of peak intensity of a specific mass spectrum in the obtained mass spectrum data. The analysis model in this case is determined by multivariate analysis such as OPLS-DA.

一般的に、液体クロマトグラフ質量分析計（LC/MS: Liquid Chromatograph/Mass Spectrometer）による尿中代謝物解析のワークフローでは、まず、複数の分析モードによるLC/MSにより尿中代謝物の網羅的解析を行い、その中から群間（例えば、健常者とがん患者）で有意差のある代謝物を抽出する。複数の分析モードは、液体クロマトグラフにおける逆相モードと親水性相互作用モード、質量分析計における正負イオン化モードを組み合わせとなる。 In general, in the workflow of urinary metabolite analysis by Liquid Chromatograph / Mass Spectrometer (LC / MS), first, comprehensive analysis of urinary metabolites by LC / MS using multiple analysis modes is performed. , And extract metabolites that are significantly different between the groups (for example, healthy subjects and cancer patients). The plurality of analysis modes are a combination of a reverse phase mode and a hydrophilic interaction mode in a liquid chromatograph, and a positive / negative ionization mode in a mass spectrometer.

LC/MSにより得られるデータは、液体クロマトグラフにおける各代謝物の溶出時間、質量分析計における各代謝物のイオンの質量とイオン強度の3次元データである（図２の上段参照）。これらのデータを前処理後、2群間（例えば、健常者とがん患者）で有意な差がある代謝物を検定で確認しながら、多変量解析（主成分解析法（PCA: Principal Component Analysis）、部分最小二乗判別法(PLS判別法: PLS-DA: Partial Least Squares Discriminant Analysis)、OPLS判別分析法(OPLS-DA: Orthogonal Partial Least Squares Discriminant Analysis)、カーネル判別分析法など）を実施して、2群を識別するのに有効なバイオマーカー候補を抽出し、（がん）検査モデルを構築するというものである。このような方法により、これまで、農作物や食品の甘味成分や機能性成分の解析、あるいは血液を用いた大腸がん検査などに有用なバイオマーカーが見出されている。 The data obtained by LC / MS are three-dimensional data of the elution time of each metabolite in the liquid chromatograph and the ionic mass and ionic strength of each metabolite in the mass spectrometer (see the upper part of FIG. 2). After preprocessing these data, multivariate analysis (principal component analysis (PCA)) while confirming metabolites that have a significant difference between the two groups (for example, healthy subjects and cancer patients) by a test. ), Partial Least Squares Discriminant Analysis (PLS-DA), OPLS-DA (Orthogonal Partial Least Squares Discriminant Analysis), Kernel Discriminant Analysis, etc.) , Extract biomarker candidates that are effective in distinguishing the two groups, and build a (cancer) test model. By such a method, biomarkers useful for analysis of sweetness components and functional components of agricultural products and foods, colorectal cancer tests using blood, and the like have been found so far.

しかしながら、例えば、尿中代謝物を用いたがん検査の場合、OPLS判別分析法を用いても、バイオマーカーの抽出に有効でない場合がある。一般に、OPLS判別分析法によるバイオマーカーの抽出では、Sプロットを用いる。Sプロットは、LC/MSによる網羅的解析により検出された代謝物について、横軸に群間におけるピーク強度差の絶対値（値が大きいほど群間で差が大きく、バイオマーカー候補となる）、縦軸に信頼性（繰り返し精度、群間の差と群内誤差の相関係数で、1に近いほど個体差がなく群間で差がある）としてプロットされた図で、バイオマーカー候補探索によく使用される。がんの解析の場合、尿中代謝物では、群間の強度差がそれほど大きくなく、群内誤差も大きい場合があり、加えて、バイオマーカー候補の代謝物が数千にものぼり構造も多岐にわたるので、Sプロットでバイオマーカー候補を絞り込むことは非常に難しい。 However, for example, in the case of a cancer test using urinary metabolites, even if OPLS discriminant analysis method is used, it may not be effective for extraction of biomarkers. In general, the S plot is used for the extraction of biomarkers by the OPLS discriminant analysis method. The S plot shows the absolute value of the peak intensity difference between groups on the horizontal axis for the metabolites detected by comprehensive analysis by LC / MS (the larger the value, the larger the difference between groups, and it becomes a biomarker candidate). The figure plotted on the vertical axis as reliability (repetition accuracy, correlation coefficient between group differences and intragroup error, the closer it is to 1, there is no individual difference and there is a difference between groups), and it is used for biomarker candidate search. Often used. In the case of cancer analysis, in urinary metabolites, the difference in intensity between groups may not be so large, and the error within the group may be large. In addition, there are thousands of biomarker candidate metabolites and the structure is diverse. It is very difficult to narrow down the biomarker candidates with the S plot.

したがって、本発明は、尿中代謝物のバイオマーカーを探索するための改善方法を提供することを課題とする。 Therefore, it is an object of the present invention to provide an improved method for searching for biomarkers of urinary biotransformers.

上記課題を踏まえ、尿検体を用いたがん検査モデル構築におけるワークフローの変更を検討した。すなわち、OPLS判別分析法のSプロットにより重要なバイオマーカー候補を直接抽出するのではなく、以下の解析フローとする：
(a) LC/MSによる網羅的解析結果について、場合によりウィルコクソンの順位和検定により、健常者に対してがん患者で有意な差がある尿中代謝物に対して、p値による絞込みを行う。通常、このときの有意水準は5％とする。
(b) 上記の検定では重要なバイオマーカー候補の代謝物の絞込みは行うことができるものの定量的な評価が難しいため、絞り込んだ代謝物に対して、機械学習のひとつであるランダムフォレスト（RF：Random Forest）法を用いて、代謝物の重要度の評価を定量的に行う。重要度という数値が高いほど重要な代謝物となり、上位にランク付けされる。
(c) 抽出したバイオマーカー群を用いて、判別分析法（例えばOPLS/カーネル判別分析法）により健常者とがん患者を識別するための判別式を計算し、がん検査モデルを構築する。
(d) 場合によっては、バイオマーカー数を変えた複数のがん検査モデルを準備する。例えば、重要度の高い順から、バイオマーカーを2個、10個、30個とするがん検査モデルを構築する。 Based on the above issues, we examined changes in the workflow for building a cancer checking model using urine samples. That is, instead of directly extracting important biomarker candidates by the S plot of the OPLS discriminant analysis method, the analysis flow is as follows:
(a) Comprehensive analysis results by LC / MS may be narrowed down by p-value for urinary metabolites that have a significant difference between healthy subjects and cancer patients by Wilcoxon's rank sum test. .. Normally, the significance level at this time is 5%.
(b) Although it is possible to narrow down the biomarker candidates that are important biomarkers in the above test, it is difficult to quantitatively evaluate them. Therefore, for the narrowed down biomarkers, random forest (RF: The importance of biotransforms is quantitatively evaluated using the Random Forest) method. The higher the value of importance, the more important the metabolite and the higher the ranking.
(c) Using the extracted biomarker group, calculate a discriminant formula for discriminant analysis (for example, OPLS / kernel discriminant analysis) to distinguish between healthy subjects and cancer patients, and construct a cancer test model.
(d) In some cases, prepare multiple cancer checking models with different numbers of biomarkers. For example, build a cancer checking model with 2, 10, and 30 biomarkers in descending order of importance.

一態様において、本発明は、
（a）尿検体を液体クロマトグラフ質量分析計（LC/MS）に供し、該尿検体中の尿中代謝物を解析するステップ、
（b）前記尿中代謝物の解析データに基づいて、ランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップ、
（c）前記選択した尿中代謝物の解析データを用いて判別分析法を行うステップ、
（d）前記判別分析の結果に基づいて、特定の疾患又は状態と関連した尿中代謝物をマーカー候補として決定するステップ
を含む、尿中代謝物マーカーを探索する方法を提供する。 In one aspect, the invention
(A) A step of subjecting a urine sample to a liquid chromatograph mass spectrometer (LC / MS) and analyzing urine biotransforms in the urine sample.
(B) A step of quantitatively evaluating the importance of the urinary metabolite by a random forest method based on the analysis data of the urinary metabolite and selecting a urinary metabolite having a high importance.
(C) A step of performing a discriminant analysis method using the analysis data of the selected urinary metabolites.
(D) Provided is a method for searching for a urinary biotransformer marker, which comprises a step of determining a urinary biotransformer associated with a specific disease or condition as a marker candidate based on the result of the discriminant analysis.

一態様において、本発明は、
（a）尿検体を液体クロマトグラフ質量分析計（LC/MS）に供し、該尿検体中の尿中代謝物を解析するステップ、
（b）前記尿中代謝物の解析データに基づいて、場合により健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物をウィルコクソンの順位和検定により絞り込んだ上で、ランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップ、
（c）前記選択した尿中代謝物をバイオマーカーとして、判別分析法による検査モデルを構築するステップ、
（d）場合により、前記判別分析結果に基づいて、バイオマーカー数の異なる複数の検査モデルを構築するステップ
を含む、尿中代謝物マーカーを探索する方法を提供する。 In one aspect, the invention
(A) A step of subjecting a urine sample to a liquid chromatograph mass spectrometer (LC / MS) and analyzing urinary metabolites in the urine sample.
(B) Based on the analysis data of the urinary biotransformers, the urinary biotransformers that were significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects were narrowed down by Wilcoxon's rank sum test. Above, the step of quantitatively evaluating the importance of the urinary metabolite by the random forest method and selecting the urinary metabolite having a high importance.
(C) A step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker.
(D) Provided, optionally, a method for searching for urinary biotransformer markers, which comprises the step of constructing a plurality of test models having different numbers of biomarkers based on the discriminant analysis result.

別の態様において、本発明は、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データが入力される計数部と、
前記尿中代謝物の解析データを用いて数値解析を行う算出部と、
前記解析の結果により尿中代謝物を選択する及び／又は尿中代謝物をマーカー候補として決定する判定部と、
前記判定部による判定結果を出力のために処理する出力処理部と
を備え、
前記算出部は、前記尿中代謝物の解析データを用いてランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価する、及び前記尿中代謝物の解析データを用いた判別分析法を行い、
前記判定部は、前記ランダムフォレスト法による評価の結果に基づいて重要度の高い尿中代謝物を選択し、前記判別分析法の結果に基づいて尿中代謝物をマーカー候補として決定する、尿中代謝物マーカーを探索するためのシステムを提供する。 In another aspect, the invention
A counting unit for inputting analysis data of urinary biotransformers by a liquid chromatograph mass spectrometer (LC / MS),
A calculation unit that performs numerical analysis using the analysis data of urinary metabolites,
A determination unit that selects urinary metabolites based on the results of the analysis and / or determines urinary metabolites as marker candidates.
It is provided with an output processing unit that processes the determination result by the determination unit for output.
The calculation unit quantitatively evaluates the importance of the urinary metabolite by a random forest method using the analysis data of the urinary metabolite, and a discriminant analysis method using the analysis data of the urinary metabolite. And
The determination unit selects urinary metabolites having high importance based on the result of the evaluation by the random forest method, and determines the urinary metabolite as a marker candidate based on the result of the discriminant analysis method. It provides a system for searching for biotransformer markers.

別の態様において、本発明は、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データが入力される計数部と、
前記尿中代謝物の解析データを用いて数値解析を行う算出部と、
前記解析の結果により尿中代謝物を選択する及び／又は尿中代謝物をマーカー候補として決定する判定部と、
前記判定部による判定結果を出力のために処理する出力処理部と
を備え、
前記算出部は、前記尿中代謝物の解析データを用いて、場合によりウィルコクソンの順位和検定によりバイオマーカー候補を絞り込んだ上で、ランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価する、及び前記尿中代謝物をバイオマーカーとして判別分析法を行い、
前記判定部は、前記ランダムフォレスト法による評価の結果に基づいて重要度の高い尿中代謝物を選択し、前記判別分析法の結果に基づいて尿中代謝物をマーカー候補として決定し、場合により判別分析法により検査モデルを構築する、尿中代謝物マーカーを探索するためのシステムを提供する。 In another aspect, the invention
A counting unit for inputting analysis data of urinary biotransformers by a liquid chromatograph mass spectrometer (LC / MS),
A calculation unit that performs numerical analysis using the analysis data of urinary metabolites,
A determination unit that selects urinary metabolites based on the results of the analysis and / or determines urinary metabolites as marker candidates.
It is provided with an output processing unit that processes the determination result by the determination unit for output.
The calculation unit uses the analysis data of the urinary metabolites, and in some cases narrows down the biomarker candidates by the Wilcoxon rank sum test, and then quantitatively determines the importance of the urinary metabolites by the random forest method. Evaluate and perform a discriminative analysis method using the urinary metabolite as a biomarker.
The determination unit selects a urinary metabolite having a high importance based on the result of the evaluation by the random forest method, determines the urinary metabolite as a marker candidate based on the result of the discriminant analysis method, and may determine it as a marker candidate. Provided is a system for searching for urinary biotransformer markers, which constructs a test model by discriminant analysis.

また別の態様において、本発明は、
コンピュータに、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データに基づいて、ランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップと、
前記選択した尿中代謝物の解析データを用いて判別分析法を行うステップと、
前記判別分析の結果に基づいて、特定の疾患又は状態と関連した尿中代謝物をマーカー候補として決定するステップと
を実行させる、尿中代謝物マーカーを探索するためのプログラムを提供する。 In yet another aspect, the present invention
On the computer
Based on the analysis data of urinary metabolites by liquid chromatograph mass spectrometer (LC / MS), the importance of the urinary metabolites is quantitatively evaluated by the random forest method, and the urinary metabolites of high importance are evaluated. Steps to select and
A step of performing a discriminant analysis method using the analysis data of the selected urinary biotransforms, and
Provided is a program for searching for a urinary biotransformer marker, which executes a step of determining a urinary biotransformer associated with a specific disease or condition as a marker candidate based on the result of the discriminant analysis.

別の態様において、本発明は、
コンピュータに、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データに基づいて、場合により健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物をウィルコクソンの順位和検定により絞り込んだ上で、ランダムフォレスト法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップと、
前記選択した尿中代謝物をバイオマーカーとして判別分析法による検査モデルを構築するステップと、
場合により、前記判別分析結果に基づいて、バイオマーカー数の異なる複数の検査モデルを構築するステップと
を実行させる、尿中代謝物マーカーを探索するためのプログラムを提供する。 In another aspect, the invention
On the computer
Based on the analysis data of urinary biotransformers by liquid chromatograph mass spectrometer (LC / MS), in some cases, urinary biotransformers significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects. After narrowing down by Wilcoxon's rank sum test, the importance of the urinary metabolite is quantitatively evaluated by the random forest method, and the step of selecting the urinary metabolite with high importance, and
The step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker, and
Optionally, a program for searching for urinary biotransformer markers is provided, which executes a step of constructing a plurality of test models having different numbers of biomarkers based on the discriminant analysis result.

本発明に関連する更なる特徴は、本明細書の記述及び添付図面から明らかになるものである。 Further features relating to the present invention will become apparent from the description and accompanying drawings herein.

本発明によれば、尿中代謝物バイオマーカーの探索において、バイオマーカー候補の絞り込みを効果的に行うことができ、特定の疾患又は状態と有意に関連性のある尿中代謝物を迅速かつ確実に決定することができる。そのため、本発明は、疾患の検査モデル、特にがん検査モデルの構築に有用である。また本発明は、疾患又は状態の早期発見、治療薬の開発などの分野にも有用である。 According to the present invention, in the search for urinary biotransformer biomarkers, biomarker candidates can be effectively narrowed down, and urinary metabolites that are significantly related to a specific disease or condition can be quickly and reliably selected. Can be decided. Therefore, the present invention is useful for constructing a disease checking model, particularly a cancer checking model. The present invention is also useful in fields such as early detection of diseases or conditions and development of therapeutic agents.

本発明の解析フローの一例を示す。An example of the analysis flow of the present invention is shown. 図１に示した解析フローと得られる結果の表示例を示す。A display example of the analysis flow shown in FIG. 1 and the obtained results is shown. 本発明の解析システムの一例を示す。An example of the analysis system of the present invention is shown. 本発明の解析フローを詳細に示す。構造未知代謝物に対する解析プロセスと併せて示している。The analysis flow of the present invention is shown in detail. It is shown together with the analysis process for structurally unknown metabolites. 本発明の解析システムにおけるデータベース構造の一例を示す。An example of the database structure in the analysis system of the present invention is shown. 本発明の解析システムにおける総代謝物データベース構造の一例を示す。An example of the total metabolite database structure in the analysis system of the present invention is shown. ウィルコクソンの順位和検定テーブルの一例を示す。代謝物をp値の順に並べてある。An example of Wilcoxon rank sum test table is shown. The metabolites are listed in order of p-value. ランダムフォレスト法による重要度評価の一例を示すグラフである。It is a graph which shows an example of the importance evaluation by the random forest method. ランダムフォレスト法による重要代謝物の一例を示す。重要度が高い順に代謝物を並べてある。An example of important metabolites by the random forest method is shown. The metabolites are arranged in descending order of importance. LC/MSにより検出された総代謝物数の表示例を示す。An example of displaying the total number of metabolites detected by LC / MS is shown. ウィルコクソンの順位和検定により絞り込まれた総代謝物数の表示例を示す。An example of displaying the total number of metabolites narrowed down by the Wilcoxon rank sum test is shown. 1個のバイオマーカーを用いた場合のOPLS判別分析結果の一例を示す。An example of OPLS discriminant analysis results when one biomarker is used is shown. 1個のバイオマーカーを用いた場合の判別結果の一例を示す。An example of the discrimination result when one biomarker is used is shown. 6個のバイオマーカーを用いた場合のOPLS判別分析結果の一例を示す。An example of OPLS discriminant analysis results when 6 biomarkers are used is shown. 6個のバイオマーカーを用いた場合の判別結果の一例を示す。An example of the discrimination result when 6 biomarkers are used is shown.

本発明は、尿中代謝物のバイオマーカーを探索するための方法、システム及びプログラムを提供する。本発明の解析フローの一例を図１に示す。図１の解析フローで得られる結果の表示例を図２に示す。また、図４に、がん検査モデルのための詳細な解析フロー（上側）を、構造未知代謝物に対する解析プロセス（下側）と併せて示す。解析フローでは、液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の網羅的解析のあと、データ前処理を実施し、場合によりウィルコクソンの順位和検定によりバイオマーカー候補を絞り込んだ上で、ランダムフォレスト法により重要度を定量的に評価し、上位に位置するバイオマーカーを用いて、多変量解析のひとつである判別分析法（例えばOPLS判別分析法又はカーネル判別分析法）により、特定の疾患又は状態を識別するための予測式を計算し、検査モデルを構築する。そして特定の疾患又は状態に関連した尿中代謝物をバイオマーカー候補として決定する。このとき、バイオマーカー数の異なる複数の検査モデルを構築し、検査精度と価格の点から、どの検査モデルを使用するか選択できるようにすることも好ましい。 The present invention provides methods, systems and programs for searching for biomarkers of urinary biotransformers. An example of the analysis flow of the present invention is shown in FIG. A display example of the result obtained by the analysis flow of FIG. 1 is shown in FIG. In addition, FIG. 4 shows a detailed analysis flow (upper side) for the cancer checking model together with an analysis process (lower side) for structurally unknown metabolites. In the analysis flow, after comprehensive analysis of urinary metabolites by liquid chromatograph mass spectrometer (LC / MS), data pretreatment is performed, and in some cases, biomarker candidates are narrowed down by Wilcoxon's rank sum test. , The importance is quantitatively evaluated by the random forest method, and specific by the discriminant analysis method (for example, OPLS discriminant analysis method or kernel discriminant analysis method) which is one of the multivariate analysis using the biomarker located at the top. Calculate predictive formulas to identify diseases or conditions and build test models. Urinary metabolites associated with a particular disease or condition are then determined as biomarker candidates. At this time, it is also preferable to construct a plurality of test models having different numbers of biomarkers so that which test model to use can be selected from the viewpoint of test accuracy and price.

したがって、本開示の尿中代謝物マーカーを探索する方法は、
（a）尿検体を液体クロマトグラフ質量分析計（LC/MS）に供し、該尿検体中の尿中代謝物を解析するステップ、
（b）前記尿中代謝物の解析データに基づいて、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップ、
（c）前記選択した尿中代謝物の解析データを用いて判別分析法を行うステップ、
（d）前記判別分析の結果に基づいて、特定の疾患又は状態と関連した尿中代謝物をマーカー候補として決定するステップ
を含む。 Therefore, the method of searching for urinary biotransformer markers of the present disclosure is:
(A) A step of subjecting a urine sample to a liquid chromatograph mass spectrometer (LC / MS) and analyzing urinary metabolites in the urine sample.
(B) A step of quantitatively evaluating the importance of the urinary metabolite by a random forest (RF) method based on the analysis data of the urinary metabolite and selecting a urinary metabolite having a high importance.
(C) A step of performing a discriminant analysis method using the analysis data of the selected urinary metabolites.
(D) Includes the step of determining urinary metabolites associated with a particular disease or condition as marker candidates based on the results of the discriminant analysis.

より具体的な実施形態において、本開示の尿中代謝物マーカーを探索する方法は、
（a）尿検体を液体クロマトグラフ質量分析計（LC/MS）に供し、該尿検体中の尿中代謝物を解析するステップ、
（b）前記尿中代謝物の解析データに基づいて、場合により健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物をウィルコクソンの順位和検定により絞り込んだ上で、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップ、
（c）前記選択した尿中代謝物をバイオマーカーとして、判別分析法による検査モデルを構築するステップ、
（d）場合により、前記判別分析結果に基づいて、バイオマーカー数の異なる複数の検査モデルを構築するステップ
を含む。 In a more specific embodiment, the method of searching for urinary biotransformer markers of the present disclosure is:
(A) A step of subjecting a urine sample to a liquid chromatograph mass spectrometer (LC / MS) and analyzing urinary metabolites in the urine sample.
(B) Based on the analysis data of the urinary biotransformers, the urinary biotransformers that were significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects were narrowed down by Wilcoxon's rank sum test. Above, the step of quantitatively evaluating the importance of the urinary metabolite by the random forest (RF) method and selecting the urinary metabolite of high importance,
(C) A step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker.
(D) In some cases, a step of constructing a plurality of test models having different numbers of biomarkers based on the discriminant analysis result is included.

まず、図１及び４に示されるように、尿検体を液体クロマトグラフ質量分析計に供し、網羅的に解析する。尿検体とは、対象から採取した尿、及び当該尿を処理して得られるサンプル（例えば、メタノールによる除蛋白したもの、あるいはトルエン、キシレン、塩酸などの保存料添加した尿）を意味する。尿検体が由来する対象は、特定の疾患又は状態を有する対象、健常対象、過去に特定の疾患又は状態を有していたが現在は健常である対象、特定の疾患又は状態となるリスクの高い（家族歴のある）対象などを含む。また対象となる種は、ヒト、及びその他の哺乳動物、例えば霊長類（サル、チンパンジーなど）、家畜動物（ウシ、ウマ、ブタ、ヒツジなど）、ペット用動物（イヌ、ネコなど）、実験動物（マウス、ラット、ウサギなど）などであり、ヒトが好ましい。 First, as shown in FIGS. 1 and 4, a urine sample is subjected to a liquid chromatograph mass spectrometer for comprehensive analysis. The urine sample means urine collected from a subject and a sample obtained by treating the urine (for example, urine deproteinized with methanol or urine to which a preservative such as toluene, xylene or hydrochloric acid is added). Subjects from which urine specimens are derived are those with a specific disease or condition, healthy subjects, subjects who had a specific disease or condition in the past but are now healthy, and are at high risk of developing a specific disease or condition. Includes subjects (with family history). The target species are humans and other mammals such as primates (monkeys, chimpanzees, etc.), domestic animals (cattle, horses, pigs, sheep, etc.), pet animals (dogs, cats, etc.), experimental animals. (Mice, rats, rabbits, etc.), and humans are preferred.

液体クロマトグラフ質量分析計（LC/MS）の解析は、慣用的な液体クロマトグラフ質量分析装置を使用して行うことができる。分析モードは特に限定されるものではなく、液体クロマトグラフにおける逆相モード又は親水性相互作用モード、そして質量分析計における正負イオン化モードのいずれを使用してもよい。複数の分析モードで網羅的解析を行うことで、尿中代謝物を網羅的に検出することができる。 Liquid chromatograph mass spectrometer (LC / MS) analysis can be performed using a conventional liquid chromatograph mass spectrometer. The analysis mode is not particularly limited, and any of a reverse phase mode or a hydrophilic interaction mode in a liquid chromatograph and a positive / negative ionization mode in a mass spectrometer may be used. By performing a comprehensive analysis in a plurality of analysis modes, urinary metabolites can be comprehensively detected.

この解析から、尿中代謝物の解析データを取得する。解析データには、例えば、各尿中代謝物において測定された、マススペクトル（液体クロマトグラフにおける保持時間、質量、イオン強度、ピークの面積値など）、各尿検体が由来する対象の診療情報が含まれる。さらに、尿中代謝物に関する情報（構造未知と構造既知代謝物）、LC/MSによる測定条件（分析モード）に関する情報などが含まれてもよい。この解析データは、グラフ（例えば図２の上段）や表（例えば図６）などの形式で表示されてもよい。 From this analysis, analysis data of urinary biotransformers is obtained. The analysis data includes, for example, mass spectra (retention time, mass, ionic strength, peak area value, etc. in a liquid chromatograph) measured in each urinary metabolite, and medical information of the target from which each urine sample is derived. included. Further, information on urinary metabolites (structure unknown and structure known metabolites), information on measurement conditions by LC / MS (analysis mode), and the like may be included. This analysis data may be displayed in a format such as a graph (for example, the upper part of FIG. 2) or a table (for example, FIG. 6).

得られた解析データは、後続のステップのために前処理することが好ましい。例えば、LC/MSにより測定されたイオン強度を考えた場合、まず欠損値を各物質データ内の最小値で代替し、測定日が複数に渡る場合は各代謝物の中央値を基準として測定日間差を修正する。その後凝固点降下法による浸透圧値を用いて尿濃縮による検体間差を標準化し、中央値を1.0として物質間のデータ分布差を規格化する、などである。また、尿検体は対象の身体状況によってその濃度が異なることが予想される。これを補正するため、尿中クレアチニンの定量による補正法、又は凝固点降下により測定された尿中溶質の濃度による補正法、例えばオスモラリティ計測を利用した補正法を適用し得る。上記補正により、正規化済みの解析データを取得し得る。 The resulting analysis data is preferably preprocessed for subsequent steps. For example, when considering the ionic strength measured by LC / MS, first substitute the missing value with the minimum value in each substance data, and if there are multiple measurement dates, the measurement day is based on the median value of each biotransform. Correct the difference. After that, the osmotic pressure value by the freezing point depression method is used to standardize the difference between samples due to urine concentration, and the median value is set to 1.0 to standardize the data distribution difference between substances. In addition, it is expected that the concentration of the urine sample will differ depending on the physical condition of the subject. In order to correct this, a correction method based on the quantification of urinary creatinine or a correction method based on the concentration of urinary solute measured by freezing point depression, for example, a correction method using osmorality measurement can be applied. By the above correction, the normalized analysis data can be acquired.

続いて、尿中代謝物について判別分析法を行う前に、尿中代謝物の絞込みを行う。すべての代謝物をランダムフォレスト法に供し、その重要度を定量的に評価するということも可能ではあるが、計算の負荷を軽減するために、ウィルコクソンの順位和検定を行って、代謝物を絞り込むことは有効である。この検定法は、得られた群間のデータに有意差があるかどうかについて検定する方法であるが、有意水準を5％（p < 0.05）とすることが多い。この検定結果は、尿中代謝物とそのp値を示す表（例えば図７）などの形式で表示されてもよい。これにより、特定の疾患又は状態と関連して有意に変化する尿中代謝物を選択して、尿中代謝物を絞り込んでから、次のステップであるランダムフォレスト法による評価に進むことができる。 Subsequently, before performing the discriminant analysis method for the urinary metabolites, the urinary metamorphosis is narrowed down. Although it is possible to subject all biotransforms to the random forest method and quantitatively evaluate their importance, Wilcoxon rank sum test is performed to narrow down the biotransforms in order to reduce the computational load. That is valid. This test method tests whether there is a significant difference in the data obtained between the groups, and the significance level is often set to 5% (p <0.05). The test result may be displayed in the form of a table (for example, FIG. 7) showing urinary metabolites and their p-values. This allows urinary biotransformers that change significantly in relation to a particular disease or condition to be selected to narrow down the urinary metabolites before proceeding to the next step, the random forest method.

ランダムフォレスト（Random Forest：RF）法は、機械学習法の一種で特徴量の重要度が計算できることが大きな特徴となっている。この評価データは、重要度の順に示したグラフ（例えば図８）や表（例えば図９）などの形式で表示されてもよい。重要度という数値が高い代謝物、すなわち解析結果で上位にある代謝物がバイオマーカーとなる可能性が大きい。 The Random Forest (RF) method is a type of machine learning method, and its major feature is that the importance of features can be calculated. The evaluation data may be displayed in a format such as a graph (for example, FIG. 8) or a table (for example, FIG. 9) shown in order of importance. Metabolites with a high degree of importance, that is, metabolites at the top of the analysis results, are likely to be biomarkers.

選択したバイオマーカーを用いて判別分析法により判別式を計算する。判別分析法として、例えばOPLS判別分析法、カーネル判別分析法などを利用することができる。OPLS判別分析法は、特定の疾患又は状態に応じて群間の差を識別する場合に有効で、各検体をひとつの点としてプロットしたスコアプロットでは、横軸で分離されていればグループ間で差があることを示し（縦軸ではグループ内の差を表す）、視覚的理解を大いに助ける（例えば図２の下に示すグラフ）。 The discriminant formula is calculated by the discriminant analysis method using the selected biomarker. As the discriminant analysis method, for example, OPLS discriminant analysis method, kernel discriminant analysis method and the like can be used. The OPLS discriminant analysis method is effective for discriminating differences between groups according to a specific disease or condition, and in a score plot in which each sample is plotted as one point, if it is separated on the horizontal axis, it is between groups. It indicates that there is a difference (the vertical axis represents the difference within the group) and greatly aids visual understanding (eg, the graph shown at the bottom of FIG. 2).

本明細書では、OPLS判別分析法を中心に説明するが、これは線形なデータ解析を行う方法である。しかし、臨床例が多くなってデータが複雑になってくると、カーネル判別分析法のように、高次元の特徴空間に移してから線形解析を行う方法が有効な場合もあり、OPLS判別分析法に限定されるものではない。 In this specification, the OPLS discriminant analysis method will be mainly described, but this is a method for performing linear data analysis. However, when the number of clinical cases increases and the data becomes complicated, it may be effective to perform linear analysis after moving to a higher-dimensional feature space, such as kernel discriminant analysis, and OPLS discriminant analysis method. It is not limited to.

特定の疾患又は状態と関連した尿中代謝物をバイオマーカーとして判別式を計算する場合、選択するバイオマーカー数により、その判別式は異なる。バイオマーカーは1個選択してもよいし、複数を組み合わせて選択してもよい。複数組み合わせることで、特定の疾患又は状態の識別力が高まり、結果として検査の精度が高まる。このようにして、特定の疾患又は状態についての検査モデルが構築される。ここまでの解析データ、分析結果、検査モデルなどは纏めて表形式で表示されてもよい（例えば図５）。 When calculating a discriminant using urinary metabolites associated with a specific disease or condition as biomarkers, the discriminant differs depending on the number of biomarkers selected. One biomarker may be selected, or a plurality of biomarkers may be selected in combination. The combination of the plurality enhances the discriminating power of a specific disease or condition, and as a result, enhances the accuracy of the test. In this way, a checking model for a particular disease or condition is constructed. The analysis data, analysis results, inspection model, etc. up to this point may be collectively displayed in a table format (for example, FIG. 5).

ここで、特定の疾患又は状態は、バイオマーカーを探索しようとする疾患又は状態、すなわち診断、判定、リスク判定などが望まれる疾患又は状態であれば特に限定されるものではない。好ましくは、疾患又は状態はがんである。がんとは、悪性腫瘍又は悪性新生物とも呼ばれる疾患を意味し、自律性増殖、浸潤と転移、及び悪液質を特徴としている。がんとしては、特に限定されるものではないが、固形がん（乳がん、大腸がん、肺がん、前立腺がん、胃がん、結腸直腸がん、膵臓がん、腎臓がん、卵巣がん、食道がん、肝臓がん、胆道がん、膀胱がん、小児がんなど）、肉腫（骨肉腫、軟骨肉腫など）、血液のがん（白血病、悪性リンパ腫、多発性骨髄腫など）が挙げられる。がんには、原発性、転移性、再発性のものがあり、がんの種類によって悪性度が異なり、またその進行度と広がりの程度からステージに分類されている。この原発性、転移性又は再発性の違いや、悪性度・ステージの違いに応じて、またがんの種類に応じて、必要な処置（治療方法）も異なる。そのため、特定のがんの種類及び／又は状態に関連した尿中代謝物マーカーを探索できれば、尿中代謝物マーカーを利用して必要な処置を決定したり、早期診断を行うことが可能となる。特定の疾患又は状態としては、がんの有無、がんの種類、がんの進行度、がんの悪性度、がんのステージ、がんの予後、がんに対する治療効果、及び抗がん剤に対する感受性が挙げられるが、これに限定されるものではない。 Here, the specific disease or condition is not particularly limited as long as it is a disease or condition for which a biomarker is to be searched, that is, a disease or condition for which diagnosis, determination, risk determination, or the like is desired. Preferably, the disease or condition is cancer. Cancer refers to a disease, also called a malignant tumor or malignant neoplasm, which is characterized by autonomous growth, infiltration and metastasis, and cachexia. The cancer is not particularly limited, but is solid cancer (breast cancer, colon cancer, lung cancer, prostate cancer, stomach cancer, colorectal cancer, pancreatic cancer, kidney cancer, ovarian cancer, esophagus). Cancer, liver cancer, biliary tract cancer, bladder cancer, childhood cancer, etc.), sarcoma (osteosarcoma, chondrosarcoma, etc.), blood cancer (leukemia, malignant lymphoma, multiple myeloma, etc.) .. There are primary, metastatic, and recurrent cancers, and the malignancy varies depending on the type of cancer, and it is classified into stages according to the degree of progression and spread. The required treatment (treatment method) differs depending on the difference in primary, metastatic or recurrent disease, the difference in malignancy / stage, and the type of cancer. Therefore, if a urinary biotransformer marker related to a specific cancer type and / or condition can be searched for, it becomes possible to determine necessary treatments or make an early diagnosis by using the urinary biotransformer marker. .. Specific diseases or conditions include the presence or absence of cancer, the type of cancer, the degree of cancer progression, the malignancy of cancer, the stage of cancer, the prognosis of cancer, the therapeutic effect on cancer, and anticancer. Sensitivity to agents, but is not limited to this.

また、上述したような尿中代謝物マーカーを探索する方法は、適当なシステム又はプログラムを用いて簡便かつ効率的に実施することができる。本開示の尿中代謝物マーカーを探索するためのシステムは、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データが入力される計数部と、
前記尿中代謝物の解析データを用いて数値解析を行う算出部と、
前記解析の結果により尿中代謝物を選択する及び／又は尿中代謝物をマーカー候補として決定する判定部と、
前記判定部による判定結果を出力のために処理する出力処理部と
を備え、
前記算出部は、前記尿中代謝物の解析データを用いてランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価する、及び前記尿中代謝物の解析データを用いた判別分析法を行い、
前記判定部は、前記ランダムフォレスト（RF）法による評価の結果に基づいて重要度の高い尿中代謝物を選択し、前記判別分析法の結果に基づいて尿中代謝物をマーカー候補として決定するものである。 In addition, the method for searching for urinary biotransformation markers as described above can be easily and efficiently carried out by using an appropriate system or program. The system for searching for urinary biotransformer markers of the present disclosure is:
A counting unit for inputting analysis data of urinary biotransformers by a liquid chromatograph mass spectrometer (LC / MS),
A calculation unit that performs numerical analysis using the analysis data of urinary metabolites,
A determination unit that selects urinary metabolites based on the results of the analysis and / or determines urinary metabolites as marker candidates.
It is provided with an output processing unit that processes the determination result by the determination unit for output.
The calculation unit quantitatively evaluated the importance of the urinary metabolite by the random forest (RF) method using the analysis data of the urinary metabolite, and used the analysis data of the urinary metabolite. Perform a discriminant analysis method
The determination unit selects a urinary metabolite of high importance based on the result of the evaluation by the random forest (RF) method, and determines the urinary metabolite as a marker candidate based on the result of the discriminant analysis method. It is a thing.

より具体的な実施形態において、本開示の尿中代謝物マーカーを探索するためのシステムは、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データが入力される計数部と、
前記尿中代謝物の解析データを用いて数値解析を行う算出部と、
前記解析の結果により尿中代謝物を選択する及び／又は尿中代謝物をマーカー候補として決定する判定部と、
前記判定部による判定結果を出力のために処理する出力処理部と
を備え、
前記算出部は、前記尿中代謝物の解析データを用いて、場合によりウィルコクソンの順位和検定によりバイオマーカー候補を絞り込んだ上で、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価する、及び前記尿中代謝物をバイオマーカーとして判別分析法を行い、
前記判定部は、前記ランダムフォレスト（RF）法による評価の結果に基づいて重要度の高い尿中代謝物を選択し、前記判別分析法の結果に基づいて尿中代謝物をマーカー候補として決定し、場合により判別分析法により検査モデルを構築するものである。 In a more specific embodiment, the system for searching for urinary biotransformer markers of the present disclosure is:
A counting unit for inputting analysis data of urinary biotransformers by a liquid chromatograph mass spectrometer (LC / MS),
A calculation unit that performs numerical analysis using the analysis data of urinary metabolites,
A determination unit that selects urinary metabolites based on the results of the analysis and / or determines urinary metabolites as marker candidates.
It is provided with an output processing unit that processes the determination result by the determination unit for output.
The calculation unit uses the analysis data of the urinary metabolites, and in some cases narrows down the biomarker candidates by the Wilcoxon rank sum test, and then determines the importance of the urinary metabolites by the random forest (RF) method. Quantitatively evaluate and discriminate analysis using the urinary metatransformer as a biomarker.
The determination unit selects urinary metabolites having high importance based on the results of evaluation by the random forest (RF) method, and determines urinary metabolites as marker candidates based on the results of the discriminant analysis method. In some cases, an inspection model is constructed by a discriminant analysis method.

本開示のシステムにおいて、
前記算出部は、前記尿中代謝物の解析データを用いてウィルコクソンの順位和検定を行い、
前記判定部は、前記ウィルコクソンの順位和検定の結果に基づいて、健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物を絞り込むものであってもよい。 In the system of the present disclosure
The calculation unit performs a Wilcoxon rank sum test using the analysis data of the urinary metabolites.
The determination unit may narrow down urinary metabolites that are significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects based on the result of the Wilcoxon rank sum test.

また本開示のシステムにおいて、
前記算出部は、前記選択した尿中代謝物をバイオマーカーとして判別分析法を行い、
前記判定部は、前記判別分析法による検査モデルを構築するものであってもよい。 Further, in the system of the present disclosure,
The calculation unit performs a discriminant analysis method using the selected urinary metabolite as a biomarker.
The determination unit may build an inspection model by the discriminant analysis method.

本開示のシステムは、好ましくは、上述したような方法を実施することができるように、メモリにおいて上記の計数部、算出部、判定部及び出力処理部が互いに動作可能なように連結された又は通信接続されたシステムである。本開示のシステムは、通信装置、入力装置、出力装置、及び／又は記憶装置と連結又は通信接続されていてもよい。具体的な解析システムの構成例を図３に示す。 In the system of the present disclosure, preferably, the counting unit, the calculating unit, the determining unit, and the output processing unit are connected to each other so as to be able to operate in the memory so that the method as described above can be carried out. It is a communication-connected system. The system of the present disclosure may be connected or communicatively connected to a communication device, an input device, an output device, and / or a storage device. A specific configuration example of the analysis system is shown in FIG.

計数部には、液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データが入力される。解析データは、連結された入力装置を介して入力されてもよいし、通信装置を介して入力されてもよい。あるいは、液体クロマトグラフ質量分析計に連結されて、そこから解析データが入力されてもよい。入力される解析データは、液体クロマトグラフ質量分析計（LC/MS）により測定された尿中代謝物のマススペクトル（液体クロマトグラフにおける保持時間、質量、イオン強度、ピークの面積値など）、各尿検体が由来する対象の診療情報を含む。解析データはさらに尿中代謝物に関する情報（構造未知と構造既知代謝物）、LC/MSによる測定条件（分析モード）などを含んでもよい。この解析データは、出力装置を介してグラフ（例えば図２の上段）や表（例えば図６）などの形式で表示されてもよい。またこの解析データは、記憶装置に総代謝物テーブルとして格納されてもよい（図３）。 Analysis data of urinary metabolites by a liquid chromatograph mass spectrometer (LC / MS) is input to the counting unit. The analysis data may be input via a connected input device or may be input via a communication device. Alternatively, it may be connected to a liquid chromatograph mass spectrometer from which analysis data may be input. The input analysis data is the mass spectrum of urinary metabolites measured by a liquid chromatograph mass spectrometer (LC / MS) (retention time, mass, ion intensity, peak area value, etc. in the liquid chromatograph). Includes medical information of the subject from which the urine sample is derived. The analysis data may further include information on urinary metabolites (structure unknown and structure known metabolites), LC / MS measurement conditions (analysis mode), and the like. This analysis data may be displayed in a format such as a graph (for example, the upper part of FIG. 2) or a table (for example, FIG. 6) via an output device. The analysis data may also be stored in a storage device as a total metabolite table (FIG. 3).

算出部は、計数部に入力された又は記憶装置から読み出された解析データを処理する。具体的には、尿中代謝物の解析データを用いて、場合によりウィルコクソンの順位和検定によりバイオマーカー候補を絞り込んだ上で、ランダムフォレスト（RF）法により重要度を定量的に評価する及び／又は判別分析法を行う。また、ランダムフォレスト法により重要度を定量的に評価し抽出したバイオマーカーにより特定の疾患又は状態を識別するために、判別分析法により、バイオマーカー数の異なる判別式を複数計算する。これらの結果は、出力装置を介してグラフや表などの形式で表示されてもよい。またこれらの結果は、記憶装置にそれぞれランダムフォレスト（RF）テーブル、判別分析結果（テーブル）、及び検定テーブルとして格納されてもよい（図３）。 The calculation unit processes the analysis data input to the counting unit or read from the storage device. Specifically, using analysis data of urinary metabolites, in some cases, biomarker candidates are narrowed down by Wilcoxon's rank sum test, and then the importance is quantitatively evaluated by the random forest (RF) method and / Alternatively, a discriminant analysis method is performed. In addition, in order to quantitatively evaluate the importance by the random forest method and identify a specific disease or condition by the extracted biomarkers, a plurality of discriminants having different numbers of biomarkers are calculated by the discriminant analysis method. These results may be displayed in the form of graphs, tables, etc. via the output device. Further, these results may be stored in the storage device as a random forest (RF) table, a discriminant analysis result (table), and a test table, respectively (FIG. 3).

判定部は、算出部で行われた解析の結果に基づいて、尿中代謝物を選択する及び／又は尿中代謝物をマーカー候補として決定する。算出部においてウィルコクソンの順位和検定を行う場合には、判定部は、ウィルコクソンの順位和検定の結果に基づいて尿中代謝物を絞り込むものであってもよい。さらに判定部は、未知試料に対して、前記判別式により計算された予測値が設定された閾値より高いか低いかによって、特定の疾患又は状態について（例えば、特定の疾患又は状態の有無について）判定することが可能である。 The determination unit selects urinary metabolites and / or determines urinary metabolites as marker candidates based on the results of the analysis performed by the calculation unit. When the Wilcoxon rank sum test is performed by the calculation unit, the determination unit may narrow down the urinary metabolites based on the result of the Wilcoxon rank sum test. Further, the determination unit determines whether the predicted value calculated by the discriminant is higher or lower than the set threshold value for the unknown sample for a specific disease or condition (for example, for the presence or absence of a specific disease or condition). It is possible to judge.

出力処理部は、判定部による判定結果を出力のために処理する。例えば、選択した尿中代謝物、及び／又はマーカー候補として決定した尿中代謝物、及び／又は構築した検査モデルを、ディスプレイなどの出力装置に視覚的表示するために処理される。具体的には、表やグラフの形式に処理される（例えば、図８、９、１２及び１４）。あるいは、スピーカーなどの出力装置に音声出力するために処理されてもよい。 The output processing unit processes the determination result by the determination unit for output. For example, the selected urinary metabolites and / or the urinary metabolites determined as marker candidates and / or the constructed test model are processed for visual display on an output device such as a display. Specifically, it is processed in the form of a table or graph (for example, FIGS. 8, 9, 12 and 14). Alternatively, it may be processed to output audio to an output device such as a speaker.

記憶装置には、総代謝物テーブル（各解析にて検出された代謝物一覧）、検定テーブル（ウィルコクソンの順位和検定による検定結果で、p値によりランク付けされている）、RFテーブル（重要度の高い順に並べた代謝物テーブル）、判別分析結果（スコアプロット）、及び検査モデル（判別分析法により求められた判別式）のうち1以上が格納される。どのような検査を行うか、すなわちどの疾患又は状態について判定する検査を行うかで、どの検査モデルを用いるかが決まる。なお、スコアプロットは、2グループ間を横軸で最大限に分離するもので、横軸で分離されていれば2グループ間に差があることを示し、一方縦軸はグループ内の差となる（例えば図２の下に示したグラフ）。 The storage device includes a total biotransformation table (list of biotransformers detected in each analysis), a test table (test results by Wilcoxon's rank sum test, ranked by p-value), and an RF table (importance). (Biotransformation table arranged in descending order of), discriminant analysis result (score plot), and test model (discriminant formula obtained by discriminant analysis method) are stored. Which test model is used depends on what kind of test is performed, that is, which test is performed to determine which disease or condition is to be performed. The score plot shows that the two groups are separated to the maximum on the horizontal axis, and if they are separated on the horizontal axis, there is a difference between the two groups, while the vertical axis is the difference within the group. (For example, the graph shown at the bottom of FIG. 2).

さらに本開示の尿中代謝物マーカーを探索するためのプログラムは、
コンピュータに、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データに基づいて、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップと、
前記選択した尿中代謝物の解析データを用いて判別分析法を行うステップと、
前記判別分析の結果に基づいて、特定の疾患又は状態と関連した尿中代謝物をマーカー候補として決定するステップと
を実行させる。 In addition, the programs for searching for urinary biotransformer markers disclosed in the present disclosure are:
On the computer
Based on the analysis data of urine biotransforms by liquid chromatograph mass spectrometer (LC / MS), the importance of the urine biotransforms is quantitatively evaluated by the random forest (RF) method, and the importance of urine is high. Steps to select medium biotransformers and
A step of performing a discriminant analysis method using the analysis data of the selected urinary biotransforms, and
Based on the results of the discriminant analysis, a step of determining urinary metabolites associated with a specific disease or condition as marker candidates is performed.

具体的な実施形態において、本開示の尿中代謝物マーカーを探索するためのプログラムは、
コンピュータに、
液体クロマトグラフ質量分析計（LC/MS）による尿中代謝物の解析データに基づいて、場合により健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物をウィルコクソンの順位和検定により絞り込んだ上で、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するステップと、
前記選択した尿中代謝物をバイオマーカーとして判別分析法による検査モデルを構築するステップと、
場合により、前記判別分析結果に基づいて、バイオマーカー数の異なる複数の検査モデルを構築するステップと
を実行させる。 In a specific embodiment, the program for searching for urinary biotransformer markers of the present disclosure is:
On the computer
Based on the analysis data of urinary biotransformers by liquid chromatograph mass spectrometer (LC / MS), in some cases, urinary biotransformers significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects. After narrowing down by Wilcoxon's rank sum test, the importance of the urinary metabolites is quantitatively evaluated by the random forest (RF) method, and the step of selecting the urinary metabolites with high importance, and
The step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker, and
In some cases, based on the discriminant analysis result, a step of constructing a plurality of test models having different numbers of biomarkers is performed.

本開示のプログラムにおいて、重要度の高い尿中代謝物を選択するステップは、前記尿中代謝物の解析データを用いて、健常者に対して特定の疾患又は状態にある患者で有意に増減している尿中代謝物をウィルコクソンの順位和検定により絞り込んだ上で、ランダムフォレスト（RF）法により前記尿中代謝物の重要度を定量的に評価し、重要度の高い尿中代謝物を選択するものであってもよい。 In the program of the present disclosure, the steps of selecting urinary biotransformers of high importance are significantly increased or decreased in patients with a specific disease or condition relative to healthy individuals using the analysis data of the urinary biotransformers. After narrowing down the urinary biotransformers that are present by Wilcoxon's rank sum test, the importance of the urinary biotransformers is quantitatively evaluated by the random forest (RF) method, and the urinary biotransformers with high importance are selected. It may be something to do.

本開示のプログラムは、コンピュータに、前記選択した尿中代謝物をバイオマーカーとして判別分析法による検査モデルを構築するステップをさらに実行させるものであってもよい。 The program of the present disclosure may further cause a computer to perform a step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker.

本開示のプログラムは、コンピュータにおいて読み出されて実行されると、尿中代謝物マーカーを探索する方法を実施することができ、また尿中代謝物マーカーを探索するためのシステムを動作させることができる。 The program of the present disclosure, when read and executed on a computer, can carry out a method of searching for urinary biotransformer markers and can operate a system for searching for urinary metabolite markers. it can.

プログラムは、例えば磁気記録媒体（ハードディスクドライブ）、CD-ROM、CD-R、RAMなどのコンピュータ可読媒体に格納されてコンピュータに供給することができ、あるいは有線又は無線通信を介してコンピュータに供給することができる。 The program can be stored on a computer-readable medium such as a magnetic recording medium (hard disk drive), CD-ROM, CD-R, RAM and supplied to the computer, or supplied to the computer via wired or wireless communication. be able to.

以下に実施例を例示し、本発明を具体的に説明するが、この実施例は単に本発明の説明のために提供するものであり、本発明の範囲を限定したり制限したりするものではない。 Hereinafter, the present invention will be specifically described by exemplifying examples, but the present examples are provided merely for the purpose of explaining the present invention, and do not limit or limit the scope of the present invention. Absent.

[実施例１]
本発明の尿中代謝物の探索方法を適用した具体的な計算例を示す（図６〜１５）。
これは、コーカソイドの健常者（15例）及び乳がん患者（15例）由来の尿検体を用いた結果である。まず、LC/MSにより尿中代謝物の網羅的解析を行う。例えば、LC/MSにより尿中代謝物を解析すると、図１０に示すような結果が得られる。すなわち、脂肪酸、リン脂質、糖、アミノ酸、ペプチド、アミン、有機酸、核酸など、構造既知、構造未知を含めて1325種の代謝物が検出される。ここで、構造未知代謝物というのは、公共データベースを用いたマススペクトル検索でヒットしなかった代謝物である。解析結果は、総代謝物データベース構造例（図６）に示すように表示することができる。 [Example 1]
Specific calculation examples to which the method for searching for urinary metabolites of the present invention is applied are shown (FIGS. 6 to 15).
This is the result of using urine samples derived from healthy Caucasian subjects (15 cases) and breast cancer patients (15 cases). First, a comprehensive analysis of urinary biotransforms is performed by LC / MS. For example, analysis of urinary biotransforms by LC / MS gives the results shown in FIG. That is, 1325 kinds of biotransformers including fatty acids, phospholipids, sugars, amino acids, peptides, amines, organic acids, nucleic acids, etc., whose structures are known and unknown are detected. Here, the structure-unknown metabolite is a metabolite that was not hit by a mass spectrum search using a public database. The analysis results can be displayed as shown in the total metabolite database structure example (FIG. 6).

これらすべての代謝物をランダムフォレスト法に供し、その重要度を定量的に評価するということも可能ではあるが、ランダムフォレスト法における計算の負荷を軽減するために、ウィルコクソンの順位和検定を行って、代謝物を絞り込むことは有効である。この検定法は、得られた群間のデータ（この解析では、健常者と乳がん患者に対応）に有意差があるかどうかについて検定する方法であるが、有意水準を5％とする場合、374種に絞り込むことができる（図１１）。通常、この検定結果は、図７に示すような尿中代謝物とp値の表で提示される。 Although it is possible to subject all these biotransforms to the random forest method and quantitatively evaluate their importance, Wilcoxon rank sum test was performed to reduce the computational load in the random forest method. , It is effective to narrow down the biotransforms. This test method tests whether there is a significant difference in the data obtained between the groups (in this analysis, it corresponds to healthy subjects and breast cancer patients), but when the significance level is 5%, 374 It can be narrowed down to species (Fig. 11). Usually, the test results are presented in a table of urinary biotransformers and p-values as shown in FIG.

しかし、まだ数が多く、しかも検定だけでは個々の代謝物の重要性が定量的に評価できないので、ランダムフォレスト法による重要度の評価を実施する。ランダムフォレスト法は、機械学習法の一種で特徴量の重要度が計算できることが大きな特徴となっている。ランダムフォレスト法による解析結果を図８に示す。図８の横軸は重要度であるが、縦軸は個々の代謝物に相当し、その対応表を図９に示す。図９の中でXとあるのは構造未知代謝物を示している。図９で示す代謝物がバイオマーカー候補であるが、外因性物質（例えば薬）の代謝物はバイオマーカー候補から排除することが望ましい。例えば、マススペクトルから判断して、上位3種はグルクロナイドという抱合体であることがわかるので、最有力バイオマーカーはXCとなる。 However, since the number is still large and the importance of individual metabolites cannot be quantitatively evaluated by the test alone, the importance is evaluated by the random forest method. The random forest method is a type of machine learning method and is characterized in that the importance of features can be calculated. The analysis result by the random forest method is shown in FIG. The horizontal axis of FIG. 8 is the importance, and the vertical axis corresponds to each metabolite, and the correspondence table is shown in FIG. In FIG. 9, X indicates a structure-unknown metabolite. Although the metabolites shown in FIG. 9 are biomarker candidates, it is desirable to exclude metabolites of exogenous substances (for example, drugs) from the biomarker candidates. For example, judging from the mass spectrum, it can be seen that the top three species are conjugates called glucuronides, so the most promising biomarker is XC.

次に、バイオマーカーを用いてOPLS判別分析法により判別式を算出する。OPLS判別分析法は、特定の疾患又は状態に応じて群間の差を識別する場合に有効な方法である。以上の考察をもとに、XCをバイオマーカーとして、OPLS判別分析法による1次元のスコアプロットを行うと、図１２に示すようになる。グラフの左15例が健常者で、右15例が乳がん患者である。スコアにおける健常者と乳がん患者を識別するための閾値を0.65とした場合の判別結果を図１３に示す。健常者15例を健常者と推定できるのが14例でこのときの誤り率が6.7％であるのに対し、乳がん患者15例を乳がんと推定できるのが15例で誤り率は0％である。これは、図１に示す解析フローにより、有力なバイオマーカーが抽出できていることを示している。 Next, the discriminant is calculated by the OPLS discriminant analysis method using the biomarker. The OPLS discriminant analysis method is an effective method for discriminating differences between groups according to a specific disease or condition. Based on the above consideration, a one-dimensional score plot by the OPLS discriminant analysis method using XC as a biomarker is shown in FIG. The 15 cases on the left of the graph are healthy subjects, and the 15 cases on the right are breast cancer patients. FIG. 13 shows the discrimination result when the threshold value for distinguishing between a healthy subject and a breast cancer patient in the score is set to 0.65. In 14 cases, 15 healthy subjects can be estimated as healthy subjects, and the error rate at this time is 6.7%, whereas in 15 breast cancer patients, 15 cases can be estimated as breast cancer, and the error rate is 0%. .. This indicates that a promising biomarker can be extracted by the analysis flow shown in FIG.

加えて、例えば、ランダムフォレスト法における結果で上位4位、5位、8位、9位及び11位の代謝物（XC、XD、XH、XE、XI、XG）を用いて同様な解析を行ってみると、図１４に示すような結果となる。スコアにおける健常者と乳がん患者を識別するための閾値を0.5とすると、健常者15例を健常者と推定できるのが15例、乳がん患者15例を乳がんと推定できるのが15例で、ともに誤り率は0％と解析精度は上がる。尿中代謝物によるがん検査は、最終的にマルチバイオマーカーにすることができることが大きな特徴でありメリットでもある。このマルチバイオマーカーも図１に示す解析フローにより簡単に抽出できる。 In addition, for example, similar analysis was performed using the top 4, 5, 8, 9 and 11 metabolites (XC, XD, XH, XE, XI, XG) in the results of the random forest method. The result is as shown in FIG. Assuming that the threshold for distinguishing between healthy subjects and breast cancer patients in the score is 0.5, 15 healthy subjects can be estimated as healthy subjects and 15 breast cancer patients can be estimated as breast cancer, both of which are incorrect. The rate is 0% and the analysis accuracy is improved. Cancer testing using urinary metabolites is a major feature and merit that it can be finally used as a multibiomarker. This multibiomarker can also be easily extracted by the analysis flow shown in FIG.

最終的に、シングルバイオマーカーにするか、マルチバイオマーカーにするかは、検査側が解析コストと解析精度の双方のデータを提示して、被験者側が選択することになる。 Ultimately, whether to use a single biomarker or a multi-biomarker will be selected by the test side by presenting data on both analysis cost and analysis accuracy.

このがん検査モデルを使用して、未知の尿検体について網羅的解析を行い、学習データで求めた判別式（バイオマーカーに相当するイオンの強度にある係数をかけた数値の和（定数を含む））の数値を判定することによって、その未知の尿検体が健常者由来かがん患者由来かについて判別することになる。あるいは、特にがん患者と健常者で有意差のある尿中代謝物をバイオマーカーとして用いて、尿中代謝物のLC/MS解析データから、尿検体が健常者由来かがん患者由来かについて判別することも可能である。 Using this cancer test model, a comprehensive analysis was performed on an unknown urine sample, and the discriminant obtained from the learning data (the sum of the values obtained by multiplying the intensity of the ion corresponding to the biomarker by a coefficient (including a constant)). )) By determining the numerical value, it is possible to determine whether the unknown urine sample is derived from a healthy person or a cancer patient. Alternatively, using urinary metabolites that are significantly different between cancer patients and healthy subjects as biomarkers, it is determined from LC / MS analysis data of urinary metabolites whether the urine sample is derived from healthy subjects or cancer patients. It is also possible.

Claims

(A) A step of subjecting a urine sample to a liquid chromatograph mass spectrometer (LC / MS) and analyzing urinary metabolites in the urine sample.
(B) A step of quantitatively evaluating the importance of the urinary metabolite by a random forest method based on the analysis data of the urinary metabolite and selecting a urinary metabolite having a high importance.
(C) A step of performing a discriminant analysis method using the analysis data of the selected urinary metabolites.
(D) A method for searching for a urinary metametase marker, which comprises a step of determining a urinary metabolite associated with a specific disease or condition as a marker candidate based on the result of the discriminant analysis.

In step (b), using the analysis data of the urinary metabolites, the urinary metabolites that are significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects are narrowed down by the Wilcoxon rank sum test. The method according to claim 1, wherein the importance of the urinary metabolite is quantitatively evaluated by a random forest method, and a urinary metabolite having a high importance is selected.

The method according to claim 1, wherein the discriminant analysis method is an OPLS discriminant analysis method or a kernel discriminant analysis method.

The method according to claim 1, wherein in step (c), a test model by a discriminant analysis method is constructed using the selected urinary metabolite as a biomarker.

The method according to claim 4, further comprising a step of constructing a plurality of test models having different numbers of biomarkers based on the result of the discriminant analysis.

Specific diseases or conditions include the presence or absence of cancer, the type of cancer, the degree of cancer progression, the malignancy of cancer, the stage of cancer, the prognosis of cancer, the therapeutic effect on cancer, and anticancer agents. The method of claim 1, wherein the method is selected from the group consisting of susceptibility to.

A counting unit for inputting analysis data of urinary biotransformers by a liquid chromatograph mass spectrometer (LC / MS),
A calculation unit that performs numerical analysis using the analysis data of urinary metabolites,
A determination unit that selects urinary metabolites based on the results of the analysis and / or determines urinary metabolites as marker candidates.
It is provided with an output processing unit that processes the determination result by the determination unit for output.
The calculation unit quantitatively evaluates the importance of the urinary metabolite by a random forest method using the analysis data of the urinary metabolite, and a discriminant analysis method using the analysis data of the urinary metabolite. And
The determination unit selects urinary metabolites having high importance based on the result of the evaluation by the random forest method, and determines the urinary metabolite as a marker candidate based on the result of the discriminant analysis method. A system for searching for biotransformer markers.

The calculation unit performs Wilcoxon rank sum test using the analysis data of the urinary metamorphosis.
7. The determination unit narrows down urinary metabolites that are significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects based on the result of the Wilcoxon rank sum test. The system described in.

The system according to claim 7, wherein the discriminant analysis method is an OPLS discriminant analysis method or a kernel discriminant analysis method.

The calculation unit performs a discriminant analysis method using the selected urinary metabolite as a biomarker.
The system according to claim 7, wherein the determination unit builds an inspection model by the discriminant analysis method.

Urinary metabolite markers are the presence or absence of cancer, the type of cancer, the degree of cancer progression, the malignancy of cancer, the stage of cancer, the prognosis of cancer, the therapeutic effect on cancer, and anticancer agents. The system of claim 7, wherein the system is associated with at least one particular disease or condition selected from the group consisting of susceptibility to.

On the computer
Based on the analysis data of urinary metabolites by liquid chromatograph mass spectrometer (LC / MS), the importance of the urinary metabolites is quantitatively evaluated by the random forest method, and the urinary metabolites of high importance are evaluated. Steps to select and
A step of performing a discriminant analysis method using the analysis data of the selected urinary biotransforms, and
A program for searching for a urinary biotransformer marker that executes a step of determining a urinary biotransformer associated with a specific disease or condition as a marker candidate based on the result of the discriminant analysis.

The step of selecting urinary biotransformers of high importance is significantly increased or decreased in patients with a specific disease or condition with respect to healthy subjects using the analysis data of the urinary biotransformers. Is narrowed down by Wilcoxon's rank sum test, the importance of the urinary metabolite is quantitatively evaluated by the random forest method, and the urinary metabolite having high importance is selected. Described program.

The program according to claim 12, wherein the discriminant analysis method is an OPLS discriminant analysis method or a kernel discriminant analysis method.

The program according to claim 12, wherein the computer is further subjected to a step of constructing a test model by a discriminant analysis method using the selected urinary metabolite as a biomarker.