Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
CN116153391B - Antiviral drug screening method, system and storage medium based on joint projection - Google Patents
[go: Go Back, main page]

CN116153391B - Antiviral drug screening method, system and storage medium based on joint projection - Google Patents

Antiviral drug screening method, system and storage medium based on joint projection Download PDF

Info

Publication number
CN116153391B
CN116153391B CN202310418161.1A CN202310418161A CN116153391B CN 116153391 B CN116153391 B CN 116153391B CN 202310418161 A CN202310418161 A CN 202310418161A CN 116153391 B CN116153391 B CN 116153391B
Authority
CN
China
Prior art keywords
matrix
drug
virus
similarity matrix
integration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310418161.1A
Other languages
Chinese (zh)
Other versions
CN116153391A (en
Inventor
汤永
王珊
李顺飞
刘建超
刘丽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese PLA General Hospital
Original Assignee
Chinese PLA General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese PLA General Hospital filed Critical Chinese PLA General Hospital
Priority to CN202310418161.1A priority Critical patent/CN116153391B/en
Publication of CN116153391A publication Critical patent/CN116153391A/en
Application granted granted Critical
Publication of CN116153391B publication Critical patent/CN116153391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供了基于联合投影的抗病毒药物筛选方法、系统及存储介质,属于生物信息学、计算生物学与人工智能交叉技术领域,方法通过系统实现,方法包括如下步骤:S1.构建病毒‑药物关联的邻接矩阵;S2.计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵;S3.计算病毒基因序列相似矩阵和药物化学结构相似矩阵;S4.使用快速核学习方法,整合得到病毒整合相似矩阵和药物整合相似矩阵;S5.使用稀疏正则化联合投影方法,构造损失函数;S6.求解损失函数,得到病毒‑药物预测得分矩阵;S7.基于病毒‑药物预测得分矩阵,筛选、排序后得到最终预测结果。本发明能有效克服噪声数据的不利影响,精确高效地筛选出病毒有效治疗药物。

Figure 202310418161

The invention provides an antiviral drug screening method, system and storage medium based on joint projection, which belongs to the interdisciplinary technical field of bioinformatics, computational biology and artificial intelligence. The method is realized through the system, and the method includes the following steps: S1. Constructing virus-drug Associated adjacency matrix; S2. Calculate virus Gaussian distance similarity matrix and drug Gaussian distance similarity matrix; S3. Calculate virus gene sequence similarity matrix and drug chemical structure similarity matrix; S4. Use fast kernel learning method to integrate virus integration similarity matrix and Drug integration similarity matrix; S5. Use the sparse regularization joint projection method to construct a loss function; S6. Solve the loss function to obtain the virus-drug prediction score matrix; S7. Based on the virus-drug prediction score matrix, obtain the final prediction after screening and sorting result. The invention can effectively overcome the adverse influence of noise data, and accurately and efficiently screen effective drugs for treating viruses.

Figure 202310418161

Description

基于联合投影的抗病毒药物筛选方法、系统及存储介质Antiviral drug screening method, system and storage medium based on joint projection

技术领域technical field

本发明涉及生物信息学、计算生物学与人工智能交叉技术领域,尤其是涉及基于联合投影的抗病毒药物筛选方法、系统及存储介质。The present invention relates to the interdisciplinary technical field of bioinformatics, computational biology and artificial intelligence, in particular to a joint projection-based antiviral drug screening method, system and storage medium.

背景技术Background technique

有研究指出,一项有效的临床新药开发基本上需要数十亿美元,平均大约需要9-12年才能成功推向市场。于是药物再利用(也称为药物重新定位),即利用现有药物寻找治疗新适应症药物成为提高新药研发生产力的一种有前景的解决方案。实验室中的探索性实验往往既昂贵又耗时,使用建模计算方法能在短时间内以较高准确性获得潜在药物候选,使用这些优选过的对象再进行“湿实验”验证,就能显著减少工作量、加速研发进程。Studies have pointed out that the development of an effective clinical new drug basically requires billions of dollars, and it takes about 9-12 years on average to successfully introduce it to the market. Drug repurposing (also known as drug repositioning), that is, using existing drugs to find drugs for new indications, has become a promising solution to improve the productivity of new drug development. Exploratory experiments in the laboratory are often expensive and time-consuming. The use of modeling and calculation methods can obtain potential drug candidates with high accuracy in a short period of time. Using these optimized objects for "wet experiment" verification, you can Significantly reduce the workload and speed up the R&D process.

已有病毒相关药物筛选的研究结果,如使用分子动力学模拟、自由能计算模拟病毒入侵过程和靶点结合状况,进而推断优选药物,也有通过基于结构信息、制作点突变来识别可能有效的药物,这些方法往往存在耗时长或过于依赖人工干预的不足,另外新发现的病毒一般可用信息较为匮乏,则此类模型的预测性能也较不理想。There are research results of virus-related drug screening, such as using molecular dynamics simulation and free energy calculation to simulate the virus invasion process and target binding status, and then deduce the optimal drug, and also identify potentially effective drugs by making point mutations based on structural information , these methods often have the shortcomings of being time-consuming or relying too much on manual intervention. In addition, the information available for newly discovered viruses is generally scarce, so the predictive performance of such models is also not ideal.

发明内容Contents of the invention

本发明提供基于联合投影的抗病毒药物筛选方法、系统及存储介质,能根据病毒-药物关联对、病毒基因组序列和药物化学结构数据,准确高效地预测抗病毒相关药物。The invention provides an antiviral drug screening method, system and storage medium based on joint projection, which can accurately and efficiently predict antiviral related drugs according to virus-drug association pairs, virus genome sequence and drug chemical structure data.

本说明书实施例的第一方面公开了基于联合投影的抗病毒药物筛选方法,包括如下步骤:The first aspect of the embodiment of this specification discloses a method for screening antiviral drugs based on joint projection, including the following steps:

S1.构建病毒-药物关联的邻接矩阵;S1. Construct the adjacency matrix of virus-drug association;

S2.基于所述病毒-药物关联的邻接矩阵,计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵;S2. Based on the adjacency matrix associated with the virus-drug, calculate the virus Gaussian distance similarity matrix and the drug Gaussian distance similarity matrix;

S3.基于病毒基因组序列计算病毒基因序列相似矩阵,基于药物化学结构计算药物化学结构相似矩阵;S3. Calculate the similarity matrix of the viral gene sequence based on the viral genome sequence, and calculate the similarity matrix of the chemical structure of the drug based on the chemical structure of the drug;

S4.基于所述病毒高斯距离相似矩阵和病毒基因序列相似矩阵,使用快速核学习方法,整合得到病毒整合相似矩阵;基于所述药物高斯距离相似矩阵和药物化学结构相似矩阵,使用快速核学习方法,整合得到药物整合相似矩阵;S4. Based on the virus Gaussian distance similarity matrix and the viral gene sequence similarity matrix, use a fast kernel learning method to integrate the virus integration similarity matrix; based on the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, use a fast kernel learning method , to obtain the drug integration similarity matrix;

S5.基于所述病毒-药物关联的邻接矩阵、病毒整合相似矩阵和药物整合相似矩阵,使用稀疏正则化联合投影方法,构造损失函数;S5. Based on the adjacency matrix, virus integration similarity matrix and drug integration similarity matrix of the virus-drug association, use the sparse regularization joint projection method to construct a loss function;

S6.求解所述损失函数,得到病毒-药物预测得分矩阵;S6. Solving the loss function to obtain a virus-drug prediction score matrix;

S7.基于所述病毒-药物预测得分矩阵,筛选出目标病毒所在行的得分,排序后得到最终预测结果。S7. Based on the virus-drug prediction score matrix, the score of the row where the target virus is located is screened out, and the final prediction result is obtained after sorting.

在本说明书公开的实施例中,在S1中:In the embodiment disclosed in this specification, in S1:

输入已知的病毒-药物关联对,构建病毒-药物关联的邻接矩阵Y;Input the known virus-drug association pairs to construct the adjacency matrix Y of virus-drug association;

若为已知关联对,则对应位置为1,否则为0;If it is a known association pair, the corresponding position is 1, otherwise it is 0;

所述邻接矩阵Y的行数为病毒数量nv,列数为药物数量nd。The number of rows of the adjacency matrix Y is the number of viruses nv, and the number of columns is the number of drugs nd.

在本说明书公开的实施例中,在S2中:In the embodiment disclosed in this specification, in S2:

若药物d(i)与某个病毒之间存在关联,则对应位置记为1,否则记为0,形成一个1×nv大小的0或1构成的向量,记之为药物d(i)的向量谱IP(d(i)),然后计算药物d(i)和d(j)之间的高斯距离相似性:If there is a relationship between the drug d(i) and a certain virus, the corresponding position is recorded as 1, otherwise it is recorded as 0, forming a vector of 0 or 1 with a size of 1×nv, which is recorded as the vector of drug d(i) Vector spectrum IP(d(i)), then calculate the Gaussian distance similarity between drugs d(i) and d(j):

Figure SMS_1
Figure SMS_1
;

上式中,IP(d(j))为药物d(j)的向量谱;参数γd用于控制核带宽,通过归一化新带宽参数γ’d获得:In the above formula, IP(d(j)) is the vector spectrum of the drug d(j); the parameter γ d is used to control the nuclear bandwidth, which is obtained by normalizing the new bandwidth parameter γ' d :

Figure SMS_2
Figure SMS_2
;

以类似的方式定义病毒v(i)和v(j)之间的高斯距离相似性,得到1×nd大小的0或1构成的向量,记之为病毒v(i)的向量谱IP(v(i)),计算病毒v(i)和v(j)之间的高斯距离相似性:Define the Gaussian distance similarity between viruses v(i) and v(j) in a similar way, and get a vector composed of 0 or 1 with a size of 1×nd, which is recorded as the vector spectrum IP(v (i)), calculate the Gaussian distance similarity between viruses v(i) and v(j):

Figure SMS_3
Figure SMS_3
;

IP(v(j))为病毒v(j)的向量谱;参数γv用于控制核带宽,通过归一化新带宽参数γ’v获得:IP(v(j)) is the vector spectrum of the virus v(j); the parameter γ v is used to control the core bandwidth, which is obtained by normalizing the new bandwidth parameter γ' v :

Figure SMS_4
Figure SMS_4
;

以上γ’d和γ’v都是常数。Both γ' d and γ' v above are constants.

在本说明书公开的实施例中,在S3中:In the embodiment disclosed in this specification, in S3:

基于病毒基因组序列,使用多序列比对方法计算病毒基因序列相似矩阵;Based on the viral genome sequence, the viral gene sequence similarity matrix is calculated using the multiple sequence alignment method;

基于药物的化学结构,得到药物MACCS指纹,采用谷本系数计算药物化学结构相似矩阵。Based on the chemical structure of the drug, the MACCS fingerprint of the drug is obtained, and the Tanimoto coefficient is used to calculate the similarity matrix of the drug chemical structure.

在本说明书公开的实施例中,在S4中:In the embodiment disclosed in this specification, in S4:

所述快速核学习方法的半正定规划式为:The semi-positive definite programming formula of described fast nuclear learning method is:

Figure SMS_5
Figure SMS_5
;

式中,第一项为重构损失范数项,表示相似矩阵的整合误差大小;第二项为正则化项,作用是避免过拟合;其中Y为病毒-药物关联邻接矩阵,Sj v(j=1,2)分别表示病毒高斯距离相似矩阵和病毒基因序列相似矩阵,μv为正则化参数,λv∈R1×2为待求解的系数,通过λv得到病毒整合相似矩阵:In the formula, the first item is the reconstruction loss norm item, which indicates the integration error of the similarity matrix; the second item is the regularization item, which is used to avoid overfitting; where Y is the virus-drug association adjacency matrix, S j v (j=1,2) represent the virus Gaussian distance similarity matrix and virus gene sequence similarity matrix respectively, μ v is the regularization parameter, λ v ∈ R 1×2 is the coefficient to be solved, and the virus integration similarity matrix is obtained by λ v :

Figure SMS_6
Figure SMS_6
;

同理,按照上述可获得药物化学结构相似矩阵与药物高斯距离相似矩阵集成参数λd∈R1×2,然后计算药物整合相似矩阵:Similarly, the integration parameter λ d ∈ R 1×2 of the drug chemical structure similarity matrix and the drug Gaussian distance similarity matrix can be obtained according to the above, and then the drug integration similarity matrix is calculated:

Figure SMS_7
Figure SMS_7
;

其中Sj d(j=1,2)分别表示药物高斯距离相似矩阵和药物化学结构相似矩阵。Among them, S j d (j=1,2) represent the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, respectively.

在本说明书公开的实施例中,在S5中:In the embodiment disclosed in this specification, in S5:

使用稀疏正则化联合投影方法构造的损失函数如下:The loss function constructed using the sparse regularized joint projection method is as follows:

Figure SMS_8
Figure SMS_8
;

式中

Figure SMS_9
为流形正则化项;W1∈Rd1×nd和W2∈Rd2×nv是待求系数矩阵;P∈Rd1×d2是投影矩阵;J(W1,W2,P)表示将等号右侧部分的值记为J,且J随变量W1、W2和P而变化;Y∈Rnv×nd是已知的病毒-药物关联矩阵,nv和nd分别表示病毒数量和药物数量;矩阵A和B是病毒整合相似矩阵Sv∈Rnv×nv和药物整合相似矩阵Sd∈Rnd×nd分解得到的低秩矩阵,即Sv≈AAT,Sd≈BBT,A∈Rnv×d1和B∈Rnd×d2表示病毒与药物的潜在特征,d1和d2分别表示矩阵Sv和Sd的秩;λ1、λ2、λ3和α是正则化参数;||·||F表示Frobenius范数,||·||*表示核范数,||·||2,1表示L2,1范数;L1∈Rn×n和L2∈Rm×m是图归一化拉普拉斯矩阵,计算方法为:In the formula
Figure SMS_9
is the manifold regularization item; W 1 ∈ R d1×nd and W 2 ∈ R d2×nv are the coefficient matrices to be sought; P ∈ R d1×d2 is the projection matrix; J(W 1 ,W 2 ,P) represents the The value on the right side of the equal sign is denoted as J, and J varies with variables W 1 , W 2 and P; Y∈R nv×nd is a known virus-drug correlation matrix, and nv and nd represent the number of viruses and drugs, respectively Quantity; matrices A and B are low-rank matrices obtained by decomposing virus integration similarity matrix S v ∈ R nv×nv and drug integration similarity matrix S d ∈ R nd ×nd , that is, S v ≈ AA T , S d ≈ BB T , A∈R nv×d1 and B∈R nd×d2 denote the potential features of viruses and drugs, d 1 and d 2 denote the ranks of matrices S v and S d respectively; λ 1 , λ 2 , λ 3 and α are the regularization Parameters; ||·|| F represents the Frobenius norm, ||·|| * represents the nuclear norm, ||·|| 2,1 represents the L2,1 norm; L 1 ∈ R n×n and L 2 ∈ R m×m is the graph normalized Laplacian matrix, the calculation method is:

Figure SMS_10
Figure SMS_10
;

Figure SMS_11
Figure SMS_11
;

其中DvRnv×nv和Dd∈Rnd×nd是对角矩阵,其元素值的计算方式为:where D vRnv×nv and D d ∈ R nd×nd are diagonal matrices, and their element values are calculated as:

Figure SMS_12
Figure SMS_12
.

在本说明书公开的实施例中,在S6中:In the embodiment disclosed in this specification, in S6:

求解所述构造损失函数时,将矩阵J分别对矩阵W1和W2求导,再令其等于0,可求得矩阵W1和W2的计算公式为:When solving the construction loss function, the matrix J is derived from the matrix W 1 and W 2 respectively, and then set to be equal to 0, the calculation formulas of the matrix W 1 and W 2 can be obtained as follows:

Figure SMS_13
Figure SMS_13
;

Figure SMS_14
Figure SMS_14
;

其中,矩阵Z1∈Rd1×d1和Z2∈Rd2×d2是对角矩阵,满足:Among them, the matrices Z 1 ∈ R d1×d1 and Z 2 ∈ R d2×d2 are diagonal matrices, satisfying:

Figure SMS_15
Figure SMS_15
;

上式中||w1 i||2和||w2 i||2分别表示矩阵W1和W2第i行的2范数;In the above formula, ||w 1 i || 2 and ||w 2 i || 2 represent the 2-norm of row i of matrix W 1 and W 2 respectively;

求解矩阵P的核范数的偏导数时,首先对矩阵P奇异值分解(SVD)即P=UΣVT,其中U∈Rd1×d1和V∈Rd2×d2;然后构建E∈Rd2×d1,若d1>d2,则E为单位矩阵I∈Rd1×d1的前d2行;否则取单位矩阵I∈Rd2×d2的前d1行;矩阵P通过以下方式计算:When solving the partial derivative of the nuclear norm of the matrix P, the singular value decomposition (SVD) of the matrix P is first performed, that is, P=UΣV T , where U∈R d1×d1 and V∈R d2×d2 ; then construct E∈R d2× d1 , if d 1 >d 2 , then E is the first d 2 rows of the identity matrix I∈R d1×d1 ; otherwise, take the first d 1 rows of the identity matrix I∈R d2×d2 ; the matrix P is calculated by the following method:

Figure SMS_16
Figure SMS_16
;

本说明书实施例的第二方面公开了基于联合投影的抗病毒药物筛选系统,包括:The second aspect of the embodiment of this specification discloses an antiviral drug screening system based on joint projection, including:

邻接矩阵构建模块,用于构建病毒-药物关联的邻接矩阵;Adjacency matrix building blocks for constructing adjacency matrices of virus-drug associations;

高斯距离相似矩阵计算模块,用于基于所述病毒-药物关联的邻接矩阵,计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵;The Gaussian distance similarity matrix calculation module is used to calculate the virus Gaussian distance similarity matrix and the drug Gaussian distance similarity matrix based on the adjacency matrix associated with the virus-drug;

病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块,用于基于病毒基因组序列计算病毒基因序列相似矩阵,基于药物化学结构计算药物化学结构相似矩阵;The viral gene sequence similarity matrix and medicinal chemical structure similarity matrix calculation module is used to calculate the viral gene sequence similarity matrix based on the viral genome sequence, and to calculate the medicinal chemical structure similarity matrix based on the medicinal chemical structure;

整合相似矩阵计算模块,用于基于所述病毒高斯距离相似矩阵和病毒基因序列相似矩阵,使用快速核学习方法,整合得到病毒整合相似矩阵;基于所述药物高斯距离相似矩阵和药物化学结构相似矩阵,使用快速核学习方法,整合得到药物整合相似矩阵;The integrated similarity matrix calculation module is used to integrate and obtain the virus integration similarity matrix based on the virus Gaussian distance similarity matrix and the viral gene sequence similarity matrix based on the virus Gaussian distance similarity matrix and the drug chemical structure similarity matrix; , use the fast kernel learning method to integrate the drug integration similarity matrix;

损失函数构造模块,用于基于所述病毒-药物关联的邻接矩阵、病毒整合相似矩阵和药物整合相似矩阵,使用稀疏正则化联合投影方法,构造损失函数;The loss function construction module is used to construct a loss function based on the adjacency matrix of the virus-drug association, the virus integration similarity matrix and the drug integration similarity matrix, using the sparse regularization joint projection method;

损失函数求解模块,用于求解所述损失函数,得到病毒-药物预测得分矩阵;The loss function solving module is used to solve the loss function to obtain the virus-drug prediction score matrix;

预测模块,用于基于所述病毒-药物预测得分矩阵,筛选出目标病毒所在行的得分,排序后得到最终预测结果。The prediction module is used to screen out the score of the row where the target virus is located based on the virus-drug prediction score matrix, and obtain the final prediction result after sorting.

在本说明书公开的实施例中,基于联合投影的抗病毒药物筛选系统,还包括:In the embodiment disclosed in this specification, the antiviral drug screening system based on joint projection further includes:

处理器,分别与所述邻接矩阵构建模块、高斯距离相似矩阵计算模块、病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块、整合相似矩阵计算模块、损失函数构造模块、损失函数求解模块和预测模块连接;The processor is respectively connected with the adjacency matrix construction module, Gaussian distance similarity matrix calculation module, virus gene sequence similarity matrix and medicinal chemical structure similarity matrix calculation module, integrated similarity matrix calculation module, loss function construction module, loss function solution module and prediction module connection;

存储器,与所述处理器连接,并存储有可在所述处理器上运行的计算获得455个已证实的人类病毒-药物相互机程序;The memory is connected to the processor and stores 455 proven human virus-drug interaction machine programs that can be run on the processor;

其中,当所述处理器执行所述计算机程序时,所述处理器控制所述邻接矩阵构建模块、高斯距离相似矩阵计算模块、病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块、整合相似矩阵计算模块、损失函数构造模块、损失函数求解模块和预测模块工作,以实现上述中任意一项所述的基于联合投影的抗病毒药物筛选方法。Wherein, when the processor executes the computer program, the processor controls the adjacency matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and drug chemical structure similarity matrix calculation module, and the integration similarity matrix The calculation module, the loss function construction module, the loss function solution module and the prediction module work to realize the joint projection-based antiviral drug screening method described in any one of the above.

本说明书实施例的第三方面公开了一种计算机可读存储介质,所述存储介质存储计算机指令,当计算机读取所述计算机指令时,所述计算机执行上述中任意一项所述的基于联合投影的抗病毒药物筛选方法。The third aspect of the embodiments of this specification discloses a computer-readable storage medium, the storage medium stores computer instructions, and when the computer reads the computer instructions, the computer executes any one of the above-mentioned joint-based Projected Antiviral Drug Screening Methods.

综上所述,本发明至少具有以下有益效果:In summary, the present invention has at least the following beneficial effects:

本发明通过构建病毒-药物关联的邻接矩阵,分别计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵;使用病毒基因组序列计算病毒基因序列相似矩阵,使用药物的化学结构信息计算药物化学结构相似矩阵;使用快速核学习法计算病毒整合相似矩阵、药物整合相似矩阵;结合矩阵分解和稀疏联合投影方法构建损失函数,迭代求解得到病毒-药物关联预测得分矩阵,筛选、排序得到最终结果。本发明能有效克服噪声数据的不利影响,快速、精确、高效地筛选出病毒有效治疗药物,避免生物医学实验方法耗时长、成本高的问题。The present invention calculates virus Gaussian distance similarity matrix and drug Gaussian distance similarity matrix respectively by constructing virus-drug association adjacency matrix; uses virus genome sequence to calculate virus gene sequence similarity matrix, uses chemical structure information of medicine to calculate drug chemical structure similarity matrix; The virus-drug integration similarity matrix and drug integration similarity matrix were calculated using the fast kernel learning method; the loss function was constructed by combining matrix decomposition and sparse joint projection method, and the virus-drug association prediction score matrix was obtained by iterative solution, and the final result was obtained by screening and sorting. The invention can effectively overcome the adverse influence of noise data, quickly, accurately and efficiently screen effective drugs for virus treatment, and avoid the problems of long time-consuming and high cost of biomedical experiment methods.

附图说明Description of drawings

为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

图1为本发明中所涉及的基于联合投影的抗病毒药物筛选方法的步骤示意图。FIG. 1 is a schematic diagram of the steps of the joint projection-based antiviral drug screening method involved in the present invention.

图2为本发明中所涉及的基于联合投影的抗病毒药物筛选方法的流程示意图。Fig. 2 is a schematic flowchart of the antiviral drug screening method based on joint projection involved in the present invention.

图3为本发明中所涉及的基于联合投影的抗病毒药物筛选方法与基线方法五折交叉验证的结果比较图。Fig. 3 is a comparison chart of the results of the five-fold cross-validation between the joint projection-based antiviral drug screening method involved in the present invention and the baseline method.

图4为本发明中所涉及的基于联合投影的抗病毒药物筛选系统的示意图。Fig. 4 is a schematic diagram of the antiviral drug screening system based on joint projection involved in the present invention.

具体实施方式Detailed ways

在下文中,仅简单地描述了某些示例性实施例。正如本领域技术人员可认识到的那样,在不脱离本发明实施例的精神或范围的情况下,可通过各种不同方式修改所描述的实施例。因此,附图和描述被认为本质上是示例性的而非限制性的。In the following, only some exemplary embodiments are briefly described. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature and not restrictive.

下文的公开提供了许多不同的实施方式或例子用来实现本发明实施例的不同结构。为了简化本发明实施例的公开,下文中对特定例子的部件和设置进行描述。当然,它们仅仅为示例,并且目的不在于限制本发明实施例。此外,本发明实施例可以在不同例子中重复参考数字和/或参考字母,这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施方式和/或设置之间的关系。The following disclosure provides many different implementations or examples for realizing different structures of the embodiments of the present invention. To simplify the disclosure of the embodiments of the present invention, components and arrangements of specific examples are described below. Of course, they are only examples and are not intended to limit the embodiments of the present invention. Furthermore, embodiments of the present invention may repeat reference numerals and/or reference letters in different instances, such repetition is for simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed.

下面结合附图对本发明的实施例进行详细说明。Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

需要注意的是,本说明书实施例中所使用的已知人类药物-病毒关联数据是从有关文献中收集的,先使用文本挖掘技术对文献报道的经过实验验证的药物-病毒相互作用对进行整理后,获得455个已证实的人类病毒-药物相互作用对,涉及34种病毒与219种药物(文献DOI:10.1016/j.asoc.2021.107135);药物化学结构从DrugBank数据库下载,病毒基因组核苷酸序列从美国国家生物技术信息中心NCBI数据库获得。It should be noted that the known human drug-virus association data used in the examples of this specification are collected from relevant literature, and text mining technology is used to sort out the experimentally verified drug-virus interaction pairs reported in the literature Finally, 455 confirmed human virus-drug interaction pairs were obtained, involving 34 viruses and 219 drugs (document DOI: 10.1016/j.asoc.2021.107135); the chemical structure of the drug was downloaded from the DrugBank database, and the nucleotides of the viral genome Sequences were obtained from the NCBI database of the National Center for Biotechnology Information.

如图1和图2所示,本说明书实施例的第一方面公开了基于联合投影的抗病毒药物筛选方法,包括如下步骤:As shown in Figure 1 and Figure 2, the first aspect of the embodiment of this specification discloses a method for screening antiviral drugs based on joint projection, including the following steps:

S1.构建病毒-药物关联的邻接矩阵。S1. Construct the adjacency matrix of virus-drug association.

输入已知的病毒-药物关联对,构建邻接矩阵Y:Input the known virus-drug association pairs and construct the adjacency matrix Y:

Figure SMS_17
Figure SMS_17
;

得到的邻接矩阵Y元素为0或1,大小为34行×219列,i与j的取值范围满足1≤i≤34,1≤j≤219。The Y element of the obtained adjacency matrix is 0 or 1, the size is 34 rows×219 columns, and the value range of i and j satisfies 1≤i≤34, 1≤j≤219.

S2.基于所述病毒-药物关联的邻接矩阵,计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵。S2. Based on the adjacency matrix of the virus-drug association, calculate the virus Gaussian distance similarity matrix and the drug Gaussian distance similarity matrix.

若药物d(i)与某个病毒之间存在关联,则对应位置记为1,否则记为0,形成一个1×34大小的0或1构成的向量,记之为药物d(i)的向量谱IP(d(i)),然后计算药物d(i)和d(j)之间的高斯距离相似性:If there is a relationship between the drug d(i) and a certain virus, the corresponding position is recorded as 1, otherwise it is recorded as 0, forming a vector of 0 or 1 with a size of 1×34, which is recorded as the vector of drug d(i) Vector spectrum IP(d(i)), then calculate the Gaussian distance similarity between drugs d(i) and d(j):

Figure SMS_18
Figure SMS_18
;

上式中,药物d(j)的向量谱IP(d(j)),参数γd用于控制核带宽,通过归一化新带宽参数γ’d获得:In the above formula, the vector spectrum IP(d(j)) of the drug d(j), the parameter γ d is used to control the nuclear bandwidth, obtained by normalizing the new bandwidth parameter γ' d :

Figure SMS_19
Figure SMS_19
;

以类似的方式定义病毒v(i)和v(j)之间的高斯距离相似性,若某一个病毒v(i)与某药物之间存在关联,则对应位置记为1,否则记为0,形成1×219大小的0或1构成的向量,记之为病毒v(i)的向量谱IP(v(i)),计算病毒v(i)和v(j)之间的高斯距离相似性:Define the Gaussian distance similarity between viruses v(i) and v(j) in a similar way. If there is an association between a certain virus v(i) and a certain drug, the corresponding position is recorded as 1, otherwise it is recorded as 0 , forming a vector of 0 or 1 with a size of 1×219, which is recorded as the vector spectrum IP(v(i)) of the virus v(i), and the Gaussian distance between the virus v(i) and v(j) is calculated similarly sex:

Figure SMS_20
Figure SMS_20
;

病毒v(j)的向量谱IP(v(j)),参数γv用于控制核带宽,通过归一化新带宽参数γ’v获得:The vector spectrum IP(v(j)) of the virus v(j), the parameter γ v is used to control the core bandwidth, obtained by normalizing the new bandwidth parameter γ' v :

Figure SMS_21
Figure SMS_21
;

以上γ’d和γ’v都是常数,取γ’d=γ’v=1;其中nv表示病毒的数量,此例中为34,nd表示药物的数量,此例中为219,此步计算后得到大小为34×34的对称矩阵S1 v(病毒高斯距离相似矩阵)和大小为219×219的对称矩阵S1 d(药物高斯距离相似矩阵),且这两个矩阵元素值全都在0到1之间。The above γ' d and γ' v are both constants, take γ' d = γ' v = 1; where nv represents the number of viruses, which is 34 in this example, and nd represents the number of drugs, which is 219 in this example, this step After calculation, a symmetric matrix S 1 v (virus Gaussian distance similarity matrix) with a size of 34×34 and a symmetric matrix S 1 d (drug Gaussian distance similarity matrix) with a size of 219×219 are obtained, and the element values of these two matrices are all in Between 0 and 1.

S3.基于病毒基因组序列计算病毒基因序列相似矩阵,基于药物化学结构计算药物化学结构相似矩阵。S3. Calculate the similarity matrix of the viral gene sequence based on the viral genome sequence, and calculate the similarity matrix of the chemical structure of the drug based on the chemical structure of the drug.

输入病毒基因组序列,使用多序列比对工具MAFFT计算得到病毒基因序列相似矩阵S2 v;输入SMILES编码表示的药物化学结构,然后用化学信息学软件RDKit或Open Babel获得药物的分子访问系统指纹(MACCS),再使用R包RxnSim计算Tanimoto相似度,得到药物化学结构相似矩阵S2 d,具体计算方法是,对d(i)和d(j)两种药物,将此两种药物的MACCS片段二进制表示的字符串集分别记为D(i)和D(j),d(i)和d(j)间的相似度Sd ij值可以用下面公式计算:Input the viral genome sequence, use the multiple sequence alignment tool MAFFT to calculate the similarity matrix S 2 v of the viral gene sequence; input the chemical structure of the drug represented by the SMILES code, and then use the cheminformatics software RDKit or Open Babel to obtain the molecular access system fingerprint of the drug ( MACCS), and then use the R package RxnSim to calculate the Tanimoto similarity to obtain the drug chemical structure similarity matrix S 2 d , the specific calculation method is, for two drugs d(i) and d(j), the MACCS fragments of the two drugs The string sets expressed in binary are respectively recorded as D(i) and D(j), and the similarity S d ij value between d(i) and d(j) can be calculated by the following formula:

Figure SMS_22
Figure SMS_22
;

基于所述病毒高斯距离相似矩阵和病毒基因序列相似矩阵,使用快速核学习方法,整合得到病毒整合相似矩阵;基于所述药物高斯距离相似矩阵和药物化学结构相似矩阵,使用快速核学习方法,整合得到药物整合相似矩阵。Based on the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix, use a fast kernel learning method to integrate the virus integration similarity matrix; based on the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, use a fast kernel learning method to integrate Get the drug integration similarity matrix.

所述快速核学习方法的半正定规划式为:The semi-positive definite programming formula of described fast nuclear learning method is:

Figure SMS_23
Figure SMS_23
;

式中,第一项为重构误差项,表示相似矩阵的整合误差大小;第二项为正则化项,作用是避免过拟合;其中Y为病毒-药物关联邻接矩阵,Sj v(j=1,2)分别表示病毒高斯距离相似矩阵和病毒基因序列相似矩阵,μv为正则化参数,λv∈R1×2为待求解的系数,可使用Matlab软件中的CVX工具箱求解此优化问题,通过λv得到病毒整合相似矩阵:In the formula, the first item is the reconstruction error item, indicating the size of the integration error of the similarity matrix; the second item is the regularization item, which is used to avoid over-fitting; where Y is the virus-drug association adjacency matrix, S j v (j =1, 2) represent the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix respectively, μ v is the regularization parameter, λ v ∈ R 1×2 is the coefficient to be solved, and the CVX toolbox in Matlab software can be used to solve this As an optimization problem, the virus integration similarity matrix is obtained through λv :

Figure SMS_24
Figure SMS_24
;

同理,按照上述可获得药物结构相似矩阵与药物高斯距离相似矩阵集成参数λd∈R1×2,然后计算药物整合相似矩阵:Similarly, the integration parameter λ d ∈ R 1×2 of the drug structure similarity matrix and the drug Gaussian distance similarity matrix can be obtained according to the above, and then the drug integration similarity matrix is calculated:

Figure SMS_25
Figure SMS_25
;

其中Sj d(j=1,2)分别表示药物高斯距离相似矩阵和药物化学结构相似矩阵。Among them, S j d (j=1,2) represent the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix, respectively.

S5.基于所述病毒-药物关联的邻接矩阵、病毒整合相似矩阵和药物整合相似矩阵,使用稀疏正则化联合投影方法,构造损失函数。S5. Based on the virus-drug association adjacency matrix, virus integration similarity matrix, and drug integration similarity matrix, a loss function is constructed using a sparse regularization joint projection method.

使用稀疏正则化联合投影方法构造的损失函数如下:The loss function constructed using the sparse regularized joint projection method is as follows:

Figure SMS_26
Figure SMS_26
;

式中

Figure SMS_27
为流形正则化项;W1∈Rd1×nd和W2∈Rd2×nv是待求系数矩阵;P∈Rd1×d2是投影矩阵;J(W1,W2,P)表示将等号右侧部分的值记为J,且J随变量W1,W2和P而变化;Y∈Rnv×nd是已知的病毒-药物关联矩阵,nv和nd分别表示病毒数量和药物数量;矩阵A和B是病毒整合相似矩阵Sv∈Rnv×nv和药物整合相似矩阵Sd∈Rnd×nd分解得到的低秩矩阵,即Sv≈AAT,Sd≈BBT,A∈Rnv×d1和B∈Rnd×d2表示病毒与药物的潜在特征,d1和d2分别表示矩阵Sv和Sd的秩;λ1、λ2、λ3和α是正则化参数;||·||F表示Frobenius范数,||·||*表示核范数,||·||2,1表示L2,1范数;L1∈Rn×n和L2∈Rm×m是图归一化拉普拉斯矩阵,计算方法为:In the formula
Figure SMS_27
is the manifold regularization item; W 1 ∈ R d1×nd and W 2 ∈ R d2×nv are the coefficient matrices to be sought; P ∈ R d1×d2 is the projection matrix; J(W 1 ,W 2 ,P) represents the The value on the right side of the equal sign is denoted as J, and J varies with variables W 1 , W 2 and P; Y∈R nv×nd is a known virus-drug correlation matrix, and nv and nd represent the number of viruses and drugs, respectively Quantity; matrices A and B are low-rank matrices obtained by decomposing virus integration similarity matrix S v ∈ R nv×nv and drug integration similarity matrix S d ∈ R nd ×nd , that is, S v ≈ AA T , S d ≈ BB T , A∈R nv×d1 and B∈R nd×d2 denote the potential features of viruses and drugs, d 1 and d 2 denote the ranks of matrices S v and S d respectively; λ 1 , λ 2 , λ 3 and α are the regularization Parameters; ||·|| F represents the Frobenius norm, ||·|| * represents the nuclear norm, ||·|| 2,1 represents the L 2,1 norm; L 1 ∈ R n×n and L 2 ∈R m×m is the graph normalized Laplacian matrix, calculated as:

Figure SMS_28
Figure SMS_28
;

Figure SMS_29
Figure SMS_29
;

其中DvRnv×nv和Dd∈Rnd×nd是对角矩阵,其元素值的计算方式为:where D vRnv×nv and D dR nd×nd are diagonal matrices, and their element values are calculated as:

Figure SMS_30
Figure SMS_30
.

求解所述损失函数,得到病毒-药物预测得分矩阵。The loss function is solved to obtain a virus-drug prediction score matrix.

求解所述构造损失函数时,将J分别对W1和W2求导,再令其等于0,可求得矩阵W1和W2的计算公式为:When solving the construction loss function, derivate J with respect to W 1 and W 2 respectively, and then make it equal to 0, the calculation formulas of matrix W 1 and W 2 can be obtained as follows:

Figure SMS_31
Figure SMS_31
;

Figure SMS_32
Figure SMS_32
;

其中,矩阵Z1∈Rd1×d1和Z2∈Rd2×d2是对角矩阵,满足:Among them, the matrices Z 1 ∈ R d1×d1 and Z 2 ∈ R d2×d2 are diagonal matrices, satisfying:

Figure SMS_33
Figure SMS_33
;

上式中||w1 i||2和||w2 i||2分别表示矩阵W1和W2第i行的2范数;In the above formula, ||w 1 i || 2 and ||w 2 i || 2 represent the 2-norm of row i of matrix W 1 and W 2 respectively;

求解矩阵P的核范数的偏导数时,首先对矩阵P奇异值分解(SVD)即P=UΣVT,其中U∈Rd1×d1和V∈Rd2×d2;然后构建E∈Rd2×d1,若d1>d2,则E为单位矩阵I∈Rd1×d1的前d2行;否则取单位矩阵I∈Rd2×d2的前d1行;矩阵P通过以下方式计算:When solving the partial derivative of the nuclear norm of the matrix P, the singular value decomposition (SVD) of the matrix P is first performed, that is, P=UΣV T , where U∈R d1×d1 and V∈R d2×d2 ; then construct E∈R d2× d1 , if d 1 >d 2 , then E is the first d 2 rows of the identity matrix I∈R d1×d1 ; otherwise, take the first d 1 rows of the identity matrix I∈R d2×d2 ; the matrix P is calculated by the following method:

Figure SMS_34
Figure SMS_34
;

S7.基于所述病毒-药物预测得分矩阵,筛选出目标病毒所在行的得分,排序后得到最终预测结果。S7. Based on the virus-drug prediction score matrix, the score of the row where the target virus is located is screened out, and the final prediction result is obtained after sorting.

迭代计算上述步骤中的矩阵直到收敛,得到病毒-药物预测得分矩阵:Iteratively calculate the matrix in the above steps until convergence, and get the virus-drug prediction score matrix:

Figure SMS_35
Figure SMS_35
;

筛选出特定病毒所对应的行的得分,排序后得到最终预测结果。The scores of rows corresponding to specific viruses are screened out, and the final prediction results are obtained after sorting.

在上述的实例中,经初步优化后选取正则化参数λ1=1、λ2=32、λ3=2-1和α=2-5,参数d1=20,d2=30;使用Matlab编程实现上述算法时,将矩阵W1随机初始化为20行×219列的矩阵,W2随机初始化为30行×34列的矩阵,矩阵P随机初始化为20行×30列的矩阵,这三个矩阵全部元素都在(0,1)区间范围;矩阵求逆全部使用伪逆运算函数pinv();设定次数为5时退出迭代,循环运行结束后得到预测分数矩阵Y*,程序运行结束,筛选、排序得到最终预测结果。In the above example, the regularization parameters λ 1 =1, λ 2 =32, λ 3 =2 -1 and α=2 -5 are selected after preliminary optimization, and the parameters d 1 =20, d 2 =30; using Matlab When programming to implement the above algorithm, the matrix W 1 is randomly initialized to a matrix of 20 rows × 219 columns, W 2 is randomly initialized to a matrix of 30 rows × 34 columns, and the matrix P is randomly initialized to a matrix of 20 rows × 30 columns. All the elements of the matrix are in the (0,1) interval range; the matrix inversion all uses the pseudo-inverse operation function pinv(); when the number of times is set to 5, the iteration is exited, and the predicted score matrix Y * is obtained after the loop runs, and the program ends. Filter and sort to get the final prediction result.

本发明的有效性验证:Validity verification of the present invention:

如图1和图2所示的基于联合投影的抗病毒药物筛选方法,采用五重交叉验证进行预测性能评估,具体实施方式为:先将所有已知的药物-病毒关联随机平均分成5组,再将5组中的每一组依次设为测试样本,其他组作为训练样本(测试样本选取情况不同时,依赖测试样本计算所得的高斯距离相似矩阵亦随之改变)。使用训练样本作为本方法的输入得到预测结果,最后将该组中每个测试样本的预测分数与候选样本的分数进行比较。为了减少生成测试样本的过程中随机划分对结果造成的影响,进行了100次五折交叉验证。The antiviral drug screening method based on joint projection as shown in Figure 1 and Figure 2 uses five-fold cross-validation for predictive performance evaluation. The specific implementation method is: first, all known drug-virus associations are randomly and evenly divided into 5 groups, Then, each of the 5 groups is set as a test sample in turn, and the other groups are used as training samples (when the selection of test samples is different, the Gaussian distance similarity matrix calculated by relying on the test samples will also change accordingly). Predictions are obtained using the training samples as input to the method, and finally the predicted score of each test sample in the set is compared with the score of the candidate samples. In order to reduce the impact of random division on the results in the process of generating test samples, 100 times of 5-fold cross-validation was performed.

使用Matlab编程计算后获得了如下数据,如图3所示为本方法SRJPVDA与现已报道的几种病毒-药物筛选模型之间的AUROC(ROC曲线下面积)值比较。本方法在五折交叉验证中取得了0.9075±0.0056的AUROC值,表现出了比几种经典模型更加出色的预测性能。The following data were obtained after calculation using Matlab programming. Figure 3 shows the comparison of the AUROC (area under the ROC curve) value between SRJPVDA of this method and several virus-drug screening models that have been reported. This method achieved an AUROC value of 0.9075±0.0056 in the five-fold cross-validation, showing better predictive performance than several classical models.

另外一方面,对具体某种病毒,如新型冠状病毒(SARS-CoV-2)使用本方法来做预测,筛选评分矩阵Y*中SARS-CoV-2对应的行即取得新冠相关药物的预测得分,将其降序排列后中前20个药物有16个能够得到已报道文献的支持。On the other hand, for a specific virus, such as the new coronavirus (SARS-CoV-2), use this method to make predictions, and filter the row corresponding to SARS-CoV-2 in the scoring matrix Y * to obtain the prediction score of the new crown-related drugs , 16 of the top 20 drugs in descending order can be supported by the reported literature.

下表展示了预测结果前20个药物名称和支持的文献PMID号。The following table shows the top 20 drug names and supporting literature PMID numbers in the prediction results.

排序to sort 药名drug name 支持文献的PMID号PMID number of the supporting document 11 RibavirinRibavirin 3368945133689451 22 ChloroquineChloroquine 3390651433906514 33 NitazoxanideNitazoxanide 3633236136332361 44 N4-HydroxycytidineN4-Hydroxycytidine 3549221835492218 55 CamostatCamostat 3569222035692220 66 AmantadineAmantadine 3539051135390511 77 NiclosamideNiclosamide 3466416234664162 88 MizoribineMizoribine 1733651917336519 99 Mycophenolic AcidMycophenolic Acid 3257925832579258 1010 GemcitabineGemcitabine 3243297732432977 1111 Betulinic AcidBetulinic Acid 暂未确认not yet confirmed 1212 Glycyrrhizic AcidGlycyrrhizic Acid 3304117333041173 1313 BerberineBerberine 3618328436183284 1414 RemdesivirRemdesivir 3522167035221670 1515 AlisporivirAlisporivir 3237661332376613 1616 UmifenovirUmifenovir 3624585136245851 1717 FavipiravirFavipiravir 35692220,3633236135692220, 36332361 1818 MemantineMemantine 3282826932828269 1919 LopinavirLopinavir 3225176732251767 2020 ArtemisininArtemisinin 暂未确认not yet confirmed

综上,本发明的优点:In summary, the advantages of the present invention:

1、通过引入L2,1范数约束项使损失函数生成稀疏解,能有效减轻训练数据集中存在的内在噪声的影响,进而使得病毒-药物关联预测结果更具有鲁棒性、更准确;1. By introducing the L 2,1 norm constraint term, the loss function generates a sparse solution, which can effectively reduce the influence of the inherent noise in the training data set, and thus make the virus-drug association prediction results more robust and accurate;

2、联合投影方法有效融合了病毒相似性以及药物相似性信息,使得模型具有较好的可扩展性与健壮性,从而获得较佳预测结果;2. The joint projection method effectively integrates the virus similarity and drug similarity information, making the model more scalable and robust, thus obtaining better prediction results;

3、通过集成拉普拉斯项融合了流形学习理论,属于半监督模型,能够高效利用阴性样本信息提升未知关联对预测性能。3. It integrates the manifold learning theory by integrating the Laplacian term, which is a semi-supervised model and can efficiently use negative sample information to improve the prediction performance of unknown correlation pairs.

如图4所示,本说明书实施例的第二方面公开了基于联合投影的抗病毒药物筛选系统,包括:As shown in Figure 4, the second aspect of the embodiment of this specification discloses an antiviral drug screening system based on joint projection, including:

邻接矩阵构建模块,用于构建病毒-药物关联的邻接矩阵;Adjacency matrix building blocks for constructing adjacency matrices of virus-drug associations;

高斯距离相似矩阵计算模块,用于基于所述病毒-药物关联的邻接矩阵,计算病毒高斯距离相似矩阵和药物高斯距离相似矩阵;The Gaussian distance similarity matrix calculation module is used to calculate the virus Gaussian distance similarity matrix and the drug Gaussian distance similarity matrix based on the adjacency matrix associated with the virus-drug;

病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块,用于基于病毒基因组序列计算病毒基因序列相似矩阵,基于药物化学结构计算药物化学结构相似矩阵;The viral gene sequence similarity matrix and medicinal chemical structure similarity matrix calculation module is used to calculate the viral gene sequence similarity matrix based on the viral genome sequence, and to calculate the medicinal chemical structure similarity matrix based on the medicinal chemical structure;

整合相似矩阵计算模块,用于基于所述病毒高斯距离相似矩阵和病毒基因序列相似矩阵,使用快速核学习方法,整合得到病毒整合相似矩阵;基于所述药物高斯距离相似矩阵和药物化学结构相似矩阵,使用快速核学习方法,整合得到药物整合相似矩阵;The integrated similarity matrix calculation module is used to integrate and obtain the virus integration similarity matrix based on the virus Gaussian distance similarity matrix and the viral gene sequence similarity matrix based on the virus Gaussian distance similarity matrix and the drug chemical structure similarity matrix; , use the fast kernel learning method to integrate the drug integration similarity matrix;

损失函数构造模块,用于基于所述病毒-药物关联的邻接矩阵、病毒整合相似矩阵和药物整合相似矩阵,使用稀疏正则化联合投影方法,构造损失函数;The loss function construction module is used to construct a loss function based on the adjacency matrix of the virus-drug association, the virus integration similarity matrix and the drug integration similarity matrix, using the sparse regularization joint projection method;

损失函数求解模块,用于求解所述损失函数,得到病毒-药物预测得分矩阵;The loss function solving module is used to solve the loss function to obtain the virus-drug prediction score matrix;

预测模块,用于基于所述病毒-药物预测得分矩阵,筛选出目标病毒所在行的得分,排序后得到最终预测结果。The prediction module is used to screen out the score of the row where the target virus is located based on the virus-drug prediction score matrix, and obtain the final prediction result after sorting.

在本说明书公开的实施例中,基于联合投影的抗病毒药物筛选系统,还包括:In the embodiment disclosed in this specification, the antiviral drug screening system based on joint projection further includes:

处理器,分别与所述邻接矩阵构建模块、高斯距离相似矩阵计算模块、病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块、整合相似矩阵计算模块、损失函数构造模块、损失函数求解模块和预测模块连接;The processor is respectively connected with the adjacency matrix construction module, Gaussian distance similarity matrix calculation module, virus gene sequence similarity matrix and medicinal chemical structure similarity matrix calculation module, integrated similarity matrix calculation module, loss function construction module, loss function solution module and prediction module connection;

存储器,与所述处理器连接,并存储有可在所述处理器上运行的计算机程序;a memory connected to the processor and storing a computer program operable on the processor;

其中,当所述处理器执行所述计算机程序时,所述处理器控制所述邻接矩阵构建模块、高斯距离相似矩阵计算模块、病毒基因序列相似矩阵与药物化学结构相似矩阵计算模块、整合相似矩阵计算模块、损失函数构造模块、损失函数求解模块和预测模块工作,以实现上述中任意一项所述的基于联合投影的抗病毒药物筛选方法。Wherein, when the processor executes the computer program, the processor controls the adjacency matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and drug chemical structure similarity matrix calculation module, and the integration similarity matrix The calculation module, the loss function construction module, the loss function solution module and the prediction module work to realize the joint projection-based antiviral drug screening method described in any one of the above.

本说明书实施例的第三方面公开了一种计算机可读存储介质,所述存储介质存储计算机指令,当计算机读取所述计算机指令时,所述计算机执行上述中任意一项所述的基于联合投影的抗病毒药物筛选方法The third aspect of the embodiments of this specification discloses a computer-readable storage medium, the storage medium stores computer instructions, and when the computer reads the computer instructions, the computer executes any one of the above-mentioned joint-based Projected Antiviral Drug Screening Method

以上所述实施例是用以说明本发明,并非用以限制本发明,所以举例数值的变更或等效元件的置换仍应隶属本发明的范畴。The above-mentioned embodiments are used to illustrate the present invention, not to limit the present invention, so the change of the numerical value or the replacement of equivalent elements should still belong to the scope of the present invention.

由以上详细说明,可使本领域普通技术人员明了本发明的确可达成前述目的,实已符合专利法的规定。From the above detailed description, those skilled in the art can understand that the present invention can indeed achieve the above-mentioned purpose, and it is in compliance with the provisions of the patent law.

尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,应当指出的是,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention. The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. It should be noted that any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should include Within the protection scope of the present invention.

应当注意的是,上述有关流程的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对流程进行各种修正和改变。然而,这些修正和改变仍在本说明书的范围之内。It should be noted that, the above descriptions about the process are only for illustration and description, and do not limit the scope of application of this specification. For those skilled in the art, various modifications and changes can be made to the flow under the guidance of this specification. However, such modifications and changes are still within the scope of this specification.

上文已对基本概念做了描述,显然,对于阅读此申请后的本领域的普通技术人员来说,上述发明披露仅作为示例,并不构成对本申请的限制。虽然此处并未明确说明,但本领域的普通技术人员可能会对本申请进行各种修改、改进和修正。该类修改、改进和修正在本申请中被建议,所以该类修改、改进、修正仍属于本申请示范实施例的精神和范围。The basic concepts have been described above. Obviously, for those of ordinary skill in the art after reading this application, the above invention disclosure is only an example and does not constitute a limitation to this application. Although not expressly stated herein, various modifications, improvements, and amendments to this application may occur to those skilled in the art. Such modifications, improvements, and amendments are suggested in this application, so such modifications, improvements, and amendments still belong to the spirit and scope of the exemplary embodiments of this application.

同时,本申请使用了特定词语来描述本申请的实施例。例如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本申请至少一个实施例有关的某一特征、结构或特性。因此,应当强调并注意的是,本说明书中在不同位置两次或以上提及的“一实施例”或“一个实施例”或“一替代性实施例”并不一定是指同一实施例。此外,本申请的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。Meanwhile, the present application uses specific words to describe the embodiments of the present application. For example, "one embodiment", "an embodiment", and/or "some embodiments" means a certain feature, structure or characteristic related to at least one embodiment of the present application. Therefore, it should be emphasized and noticed that "an embodiment" or "an embodiment" or "an alternative embodiment" mentioned twice or more in different positions in this specification do not necessarily refer to the same embodiment. In addition, certain features, structures or characteristics of one or more embodiments of the present application may be properly combined.

此外,本领域的普通技术人员可以理解,本申请的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的过程、机器、产品或物质的组合,或对其任何新的和有用的改进。因此,本申请的各个方面可以完全由硬件实施、可以完全由软件(包括固件、常驻软件、微代码等)实施、也可以由硬件和软件组合实施。以上硬件或软件均可被称为“单元”、“模块”或“系统”。此外,本申请的各方面可以采取体现在一个或多个计算机可读介质中的计算机程序产品的形式,其中计算机可读程序代码包含在其中。Furthermore, those of ordinary skill in the art will appreciate that aspects of the present application may be illustrated and described in several patentable categories or circumstances, including any new and useful process, machine, product, or combination of matter, or any new and useful improvements to it. Therefore, various aspects of the present application may be fully implemented by hardware, may be fully implemented by software (including firmware, resident software, microcode, etc.), or may be implemented by a combination of hardware and software. The above hardware or software may be referred to as a "unit", "module" or "system". Furthermore, aspects of the present application may take the form of a computer program product embodied on one or more computer-readable media, with computer-readable program code embodied therein.

本申请各部分操作所需的计算机程序代码可以用任意一种或以上程序设计语言编写,包括如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等的面向对象程序设计语言、如C程序设计语言、VisualBasic、Fortran2103、Perl、COBOL2102、PHP、ABAP的常规程序化程序设计语言、如Python、Ruby和Groovy的动态程序设计语言或其它程序设计语言等。该程序代码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或服务器上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。The computer program codes required for the operation of each part of the application can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python, etc. Programming language, such as C programming language, VisualBasic, Fortran2103, Perl, COBOL2102, PHP, ABAP conventional programming language, such as Python, Ruby and Groovy dynamic programming language or other programming languages, etc. The program code may run entirely on the user's computer, or as a stand-alone software package, or partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user computer through any form of network, such as a local area network (LAN) or wide area network (WAN), or to an external computer (such as through the Internet), or in a cloud computing environment, or as a service Use software as a service (SaaS).

此外,除非权利要求中明确说明,本申请所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本申请流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本申请实施例实质和范围的修正和等价组合。例如,尽管上述各种组件的实现可以体现在硬件设备中,但是它也可以实现为纯软件解决方案,例如,在现有服务器或移动设备上的安装。In addition, unless explicitly stated in the claims, the order of processing elements and sequences described in the application, the use of numbers and letters, or the use of other designations are not used to limit the order of the flow and methods of the application. While the foregoing disclosure has discussed by way of various examples some embodiments of the invention that are presently believed to be useful, it should be understood that such detail is for illustrative purposes only and that the appended claims are not limited to the disclosed embodiments, but rather, the claims The claims are intended to cover all modifications and equivalent combinations that fall within the spirit and scope of the embodiments of the application. For example, while the implementation of the various components described above could be embodied in a hardware device, it could also be implemented as a pure software solution, for example, as an installation on an existing server or mobile device.

同理,应当注意的是,为了简化本申请披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本申请的实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。然而,本申请的该方法不应被解释为反映所申明的客体需要比每个权利要求中明确记载的更多特征的意图。相反,发明的主体应具备比上述单一实施例更少的特征。In the same way, it should be noted that in order to simplify the expression disclosed in the present application and help the understanding of one or more embodiments of the invention, in the foregoing descriptions of the embodiments of the present application, sometimes multiple features are combined into one embodiment , drawings or descriptions thereof. This method of application, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, the subject matter of the invention should have fewer features than a single embodiment described above.

Claims (9)

1. The antiviral drug screening method based on the combined projection is characterized by comprising the following steps of:
s1, constructing an adjacency matrix of virus-drug association;
s2, calculating a virus Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix based on the adjacent matrix of the virus-drug association;
s3, calculating a virus gene sequence similarity matrix based on a virus genome sequence, and calculating a drug chemical structure similarity matrix based on a drug chemical structure;
s4, based on the viral Gaussian distance similarity matrix and the viral gene sequence similarity matrix, integrating by using a fast kernel learning method to obtain a viral integration similarity matrix; based on the Gaussian distance similarity matrix of the medicine and the chemical structure similarity matrix of the medicine, a rapid kernel learning method is used for integrating to obtain a medicine integration similarity matrix;
s5, constructing a loss function by using a sparse regularization joint projection method based on the adjacent matrix, the virus integration similarity matrix and the drug integration similarity matrix of the virus-drug association;
s6, solving the loss function to obtain a virus-drug prediction score matrix;
s7, screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and sequencing to obtain a final prediction result;
in S5:
the construction loss function using sparse regularized joint projection method is as follows:
Figure FDA0004264544110000011
wherein R (A, B, W) 1 ,W 2 )=tr((AW 1 ) T L 1 (AW 1 ))+tr((BW 2 ) T L 2 (BW 2)) is a manifold regularization term; w (W) 1 ∈R d1×nd And W is 2 ∈R d2×nv Is a coefficient matrix to be solved; p epsilon R d1×d2 Is a projection matrix; j (W) 1 ,W 2 P) denotes that the value of the right part of the equal sign is recorded as J, and J follows the variable W 1 、W 2 And P varies; y εR nv×nd Is a known virus-drug association matrix, nv and nd represent the number of viruses and the number of drugs, respectively; the matrices A and B are virus integration similarity matrices S v ∈R nv×nv And drug integration similarity matrix S d ∈R nd ×nd Low rank matrix obtained by decomposition, i.e. S v ≈AA T ,S d ≈BB T ,A∈R nv×d1 And B.epsilon.R nd×d2 Representing potential characteristics of virus and drug, d1 and d2 respectively represent matrix S v And S is d Rank of (c); lambda (lambda) 1 、λ 2 、λ 3 And α is a regularization parameter; I.I F Representing the Frobenius norms, |·|| * The number of kernels is represented by a kernel norm, I.I 2,1 Represents L 2,1 A norm; l (L) 1 ∈R n×n And L 2 ∈R m×m The method is a graph normalized Laplace matrix, and comprises the following steps:
Figure FDA0004264544110000021
Figure FDA0004264544110000022
wherein D is v ∈R nv×nv And D d ∈R nd×nd Is a diagonal matrix, and the element values are calculated in the following way:
Figure FDA0004264544110000023
2. the method for screening antiviral drugs based on joint projection according to claim 1, wherein in S1:
inputting a known virus-drug association pair to construct a adjacency matrix Y of the virus-drug association;
if the correlation pair is known, the corresponding position is 1, otherwise, the correlation pair is 0;
the row number of the adjacent matrix Y is the virus number nv, and the column number is the medicine number nd.
3. The method for screening antiviral drugs based on joint projection according to claim 1, wherein in S2:
if the association exists between the medicine d (i) and a certain virus, the corresponding position is marked as 1, otherwise, the corresponding position is marked as 0, a vector formed by 0 or 1 with the size of 1 Xnv is formed, the vector spectrum is marked as the vector spectrum IP (d (i)) of the medicine d (i), nv is the number of viruses, and then the Gaussian distance similarity between the medicines d (i) and d (j) is calculated:
S d (d(i),d(j))=exp(-γ d ||IP(d(i))-IP(d(j))|| 2 );
in the above formula, IP (d (j)) is the vector spectrum of the drug d (j); parameter gamma d For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' d Obtaining:
Figure FDA0004264544110000024
in a similar manner, defining the Gaussian distance similarity between viruses v (i) and v (j), obtaining a vector consisting of 0 or 1 with the size of 1×nd, namely a vector spectrum IP (v (i)) of the virus v (i), nd being the number of medicines, and calculating the Gaussian distance similarity between the viruses v (i) and v (j):
S v (v(i),v(j))=exp(-γ v ||IP(v(i))-IP(v(j))|| 2 );
IP (v (j)) is the vector spectrum of virus v (j); parameter gamma v For controlling the nuclear bandwidth by normalizing the new bandwidth parameter gamma' v Obtaining:
Figure FDA0004264544110000031
above gamma' d And gamma' v Are constant.
4. The method for screening antiviral drugs based on joint projection according to claim 1, wherein in S3:
based on the viral genome sequence, calculating a viral gene sequence similarity matrix by using a multi-sequence alignment method;
based on the chemical structure of the medicine, the MACS fingerprint of the medicine is obtained, and the valley coefficient is adopted to calculate the chemical structure similarity matrix of the medicine.
5. The method for screening antiviral drugs based on joint projection according to claim 1, wherein in S4:
the semi-positive programming formula of the fast kernel learning method is as follows:
Figure FDA0004264544110000032
wherein, the first term is a reconstruction loss norm term and represents the magnitude of the integration error of the similarity matrix; the second term is a regularization term, which is used to avoid overfitting; wherein Y is a virus-drug association adjacency matrix, S j v (j=1, 2) respectively represent a viral Gaussian distance similarity matrix and a viral gene sequence similarity matrix, μ v For regularization parameters, lambda v ∈R 1×2 For the coefficients to be solved, by lambda v Obtaining a virus integration similarity matrix S v
Figure FDA0004264544110000033
Similarly, the integration parameter lambda of the drug structure similarity matrix and the drug Gaussian distance similarity matrix can be obtained according to the above d ∈R 1×2 Then calculate the drug integration similarity matrix S d
Figure FDA0004264544110000041
Wherein S is j d (j=1, 2) represents a pharmaceutical gaussian distance similarity matrix and a pharmaceutical chemical structure similarity matrix, respectively.
6. The antiviral drug screening method based on joint projection as claimed in claim 1, wherein in S6:
matrix W 1 And W is 2 The calculation formula of (2) is as follows:
W 1 =(2A T A+2αA T L 1 A+λ 1 Z 1 ) -1 2A T Y;
W 2 =(2B T B+2αB T L 2 B+λ 2 Z 2 ) -1 2B T Y T
wherein L is 1 ∈R n×n And L 2 ∈R m×m Is a graph normalized Laplace matrix, Y epsilon R nv×nd Is a known virus-drug association matrix, and matrices A and B are virus integration similarity matrices S v ∈R nv×nv And drug integration similarity matrix S d ∈R nd×nd Low rank matrix obtained by decomposition, i.e. S v ≈AA T ,S d ≈BB T ,A∈R nv×d1 And B.epsilon.R nd×d2 Representing potential characteristics of virus and drug, d1 and d2 respectively represent matrix S v And S is d Rank of (c); w (W) 1 ∈R d1×nd And W is 2 ∈R d2×nv Is a coefficient matrix to be solved; nv and nd represent the number of viruses and the number of drugs, respectively; lambda (lambda) 1 、λ 2 、λ 3 And α is a regularization parameter; matrix Z 1 ∈R d1×d1 And Z 2 ∈R d2×d2 Is a diagonal matrix, satisfying:
Figure FDA0004264544110000042
Figure FDA0004264544110000043
the above formula is Zhongzhi w 1 i || 2 And W 2 i || 2 Respectively represent matrix W 1 And W is 2 2 norms of row i;
when solving the partial derivative of the kernel norm of the projection matrix P, first the matrix P is decomposed with singular values, i.e. p=uΣv T Wherein U is E R d1 ×d1 And V.epsilon.R d2×d2 The method comprises the steps of carrying out a first treatment on the surface of the Then construct E.epsilon.R d2×d1 If d 1 >d 2 E is the identity matrix I epsilon R d1×d1 D front of (2) 2 A row; otherwise, taking the identity matrix I E R d2×d2 D front of (2) 1 A row; the matrix P is calculated by:
Figure FDA0004264544110000051
7. an antiviral drug screening system based on joint projection, comprising:
the adjacency matrix construction module is used for constructing an adjacency matrix of virus-drug association;
the Gaussian distance similarity matrix calculation module is used for calculating a viral Gaussian distance similarity matrix and a drug Gaussian distance similarity matrix by using the adjacent matrix of the virus-drug association;
the virus gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module is used for calculating a virus gene sequence similarity matrix by using a virus genome sequence and calculating a pharmaceutical chemical structure similarity matrix by using a pharmaceutical chemical structure;
the integration similarity matrix calculation module is used for integrating the virus Gaussian distance similarity matrix and the virus gene sequence similarity matrix by using a fast kernel learning method to obtain a virus integration similarity matrix; integrating the drug Gaussian distance similarity matrix and the drug chemical structure similarity matrix by using a rapid kernel learning method to obtain a drug integration similarity matrix;
the loss function construction module is used for constructing a loss function by using a sparse regularization joint projection method based on the adjacent matrix, the virus integration similarity matrix and the drug integration similarity matrix of the virus-drug association;
the loss function solving module is used for solving the loss function to obtain a virus-medicine prediction score matrix; the prediction module is used for screening out the scores of the rows of the target viruses based on the virus-medicine prediction score matrix, and sequencing to obtain a final prediction result;
the construction loss function by using the sparse regularization joint projection method is as follows:
Figure FDA0004264544110000052
in the middle of
Figure FDA0004264544110000053
Regularizing the term for the manifold; w (W) 1 ∈R d1×nd And W is 2 ∈R d2×nv Is a coefficient matrix to be solved; p epsilon R d1×d2 Is a projection matrix; j (W) 1 ,W 2 P) denotes that the value of the right part of the equal sign is noted as J, and J followsVariable W 1 、W 2 And P varies; y εR nv×nd Is a known virus-drug association matrix, nv and nd represent the number of viruses and the number of drugs, respectively; the matrices A and B are virus integration similarity matrices S v ∈R nv×nv And drug integration similarity matrix S d ∈R nd×nd Low rank matrix obtained by decomposition, i.e. S v ≈AA T ,S d ≈BB T ,A∈R nv×d1 And B.epsilon.R nd×d2 Representing potential characteristics of virus and drug, d1 and d2 respectively represent matrix S v And S is d Rank of (c); lambda (lambda) 1 、λ 2 、λ 3 And α is a regularization parameter; I.I F Representing the Frobenius norms, |·|| * The number of kernels is represented by a kernel norm, I.I 2,1 Represents L 2,1 A norm; l (L) 1 ∈R n×n And L 2 ∈R m×m The method is a graph normalized Laplace matrix, and comprises the following steps:
Figure FDA0004264544110000061
Figure FDA0004264544110000062
wherein D is v ∈R nv×nv And D d ∈R nd×nd Is a diagonal matrix, and the element values are calculated in the following way:
Figure FDA0004264544110000063
8. the joint projection-based antiviral drug screening system of claim 7, further comprising:
the processor is respectively connected with the adjacent matrix construction module, the Gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and pharmaceutical chemical structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module;
a memory coupled to the processor and storing a computer program executable on the processor; wherein when the processor executes the computer program, the processor controls the adjacency matrix construction module, the gaussian distance similarity matrix calculation module, the viral gene sequence similarity matrix and pharmaceutical chemistry structure similarity matrix calculation module, the integration similarity matrix calculation module, the loss function construction module, the loss function solving module and the prediction module to work so as to realize the antiviral drug screening method based on joint projection according to any one of claims 1 to 6.
9. A computer readable storage medium storing computer instructions which, when read by a computer, perform the joint projection based antiviral drug screening method of any one of claims 1 to 6.
CN202310418161.1A 2023-04-19 2023-04-19 Antiviral drug screening method, system and storage medium based on joint projection Active CN116153391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310418161.1A CN116153391B (en) 2023-04-19 2023-04-19 Antiviral drug screening method, system and storage medium based on joint projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310418161.1A CN116153391B (en) 2023-04-19 2023-04-19 Antiviral drug screening method, system and storage medium based on joint projection

Publications (2)

Publication Number Publication Date
CN116153391A CN116153391A (en) 2023-05-23
CN116153391B true CN116153391B (en) 2023-06-30

Family

ID=86358546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310418161.1A Active CN116153391B (en) 2023-04-19 2023-04-19 Antiviral drug screening method, system and storage medium based on joint projection

Country Status (1)

Country Link
CN (1) CN116153391B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631502A (en) * 2023-07-24 2023-08-22 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on hypergraph learning
CN116759015B (en) * 2023-08-21 2023-11-24 中国人民解放军总医院 Antiviral drug screening method and system based on hypergraph matrix tri-decomposition
CN116759016A (en) * 2023-08-21 2023-09-15 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on least squares method
CN116798545B (en) * 2023-08-21 2023-11-14 中国人民解放军总医院 Antiviral drug screening method, system and storage medium based on non-negative matrix

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348115A (en) * 2020-11-30 2021-02-09 福州大学 Feature selection method for solving biological data classification
CN113241115A (en) * 2021-03-26 2021-08-10 广东工业大学 Depth matrix decomposition-based circular RNA disease correlation prediction method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11037684B2 (en) * 2014-11-14 2021-06-15 International Business Machines Corporation Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity
US20190114390A1 (en) * 2017-10-13 2019-04-18 BioAge Labs, Inc. Drug repurposing based on deep embeddings of gene expression profiles
CN110993121A (en) * 2019-12-06 2020-04-10 南开大学 Drug association prediction method based on double-cooperation linear manifold
CN111785320B (en) * 2020-06-28 2024-02-06 西安电子科技大学 Drug target interaction prediction method based on multi-layer network representation learning
CN115966252B (en) * 2023-02-12 2024-01-19 中国人民解放军总医院 Antiviral drug screening method based on L1 norm graph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348115A (en) * 2020-11-30 2021-02-09 福州大学 Feature selection method for solving biological data classification
CN113241115A (en) * 2021-03-26 2021-08-10 广东工业大学 Depth matrix decomposition-based circular RNA disease correlation prediction method

Also Published As

Publication number Publication date
CN116153391A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN116153391B (en) Antiviral drug screening method, system and storage medium based on joint projection
Wang et al. Identification of human microRNA-disease association via low-rank approximation-based link propagation and multiple kernel learning
CN116189760B (en) Antiviral drug screening method, system and storage medium based on matrix completion
Monteiro et al. Drug-target interaction prediction: end-to-end deep learning approach
Töpfer et al. Probabilistic inference of viral quasispecies subject to recombination
Yang et al. Overlap matrix completion for predicting drug-associated indications
CN115966252B (en) Antiviral drug screening method based on L1 norm graph
Asfand-E-Yar et al. Multimodal CNN-DDI: using multimodal CNN for drug to drug interaction associated events
CN114913916A (en) Drug relocation method for predicting new coronavirus adaptive drugs
Lu et al. DR2DI: a powerful computational tool for predicting novel drug-disease associations
Shi et al. Multiview robust graph-based clustering for cancer subtype identification
CN104021316B (en) Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine
Weighill et al. Gene regulatory network inference as relaxed graph matching
Yi et al. Learning representation of molecules in association network for predicting intermolecular associations
Uddin et al. Deep-m6Am: a deep learning model for identifying N6, 2′-O-Dimethyladenosine (m6Am) sites using hybrid features.
Daroch et al. MDbDMRP: A novel molecular descriptor-based computational model to identify drug-miRNA relationships
Xie et al. Discovery of novel therapeutic properties of drugs from transcriptional responses based on multi-label classification
Yang et al. Identification of circRNA‐disease associations via multi‐model fusion and ensemble learning
CN116230077A (en) Antiviral Drug Screening Method Based on Restart Hypergraph Double Random Walk
Huang et al. NetPro: neighborhood interaction-based drug repositioning via label propagation
CN116631502A (en) Antiviral drug screening method, system and storage medium based on hypergraph learning
Sun et al. Enhancing drug synergy combination: integrating graph transformers and BiLSTM for accurate drug synergy prediction
Wang et al. WGMFDDA: A novel weighted-based graph regularized matrix factorization for predicting drug-disease associations
CN116631537B (en) Antiviral drug screening method, system and storage medium based on fuzzy learning
CN114662657B (en) A polynomial dendritic neural network and its prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant