Disclosure of Invention
In order to solve the above problems, the present invention aims to provide a method for screening SNP sites related to aortic dissection disease based on a full-exon sequencing technology, wherein the SNP sites comprise the following aortic dissection susceptibility SNP sites: CLIP3: NM-001199570: exon1: c.32C > T: p.P11L.
Further, the invention provides application of the SNP locus in preparing a reagent for detecting aortic dissection diseases.
Furthermore, the invention also provides application of the SNP locus in preparing an early diagnosis kit for predicting aortic dissection diseases.
Preferably, the kit comprises a reagent for detecting whether mutation occurs at aortic dissection susceptibility SNP site CLIP3: NM-001199570: exon1: c.32C > T: p.P11L.
Preferably, the kit may be a reagent for detecting SNPs using any technique known in the art, as long as it can detect the presence or absence of a mutation at a susceptible SNP site CLIP3: NM-001199570: exon1: c.32C > T: p.P11L in a sample. Including but not limited to the embodiments listed below.
In a first embodiment, the kit comprises a reagent for detecting the existence of T allele of susceptible SNP locus CLIP3: NM-001199570: exon1: c.32C > T: p.P11L locus in a sample by using a sequencing method. Sequencing is a technique known in the art, and reagents such as primers are required, and can be selected by a person of ordinary skill in the art according to needs (see instructions related to the sequencer of ABI, Beckman, etc.), and are not described herein again. By using the kit, the sequence of susceptible SNP sites CLIP3: NM-001199570: exon1: c.32C > T: p.P11L in a sample can be directly measured by a sequencing method, so that whether the susceptible SNP sites carry the variation of the alleles of the corresponding sites or not is judged, and the susceptibility of aortic dissection is further judged.
In a second embodiment, the kit comprises a reagent for detecting the genotype of a susceptible SNP site CLIP3: NM-001199570: exon1: c.32C > T: p.P11L in a sample by using a Taqman probe SNP detection method. The Taqman probe is designed aiming at susceptible SNP sites CLIP3: NM-001199570: ex on1: c.32C > T: p.P11L, and the probe can be provided by a reagent company; it can also be designed by software, such as Beacon Designer 7.5 from PREMIER Biosoft.
In a third embodiment, the kit is a kit for detecting the susceptible SNP sites CLIP3: NM-001199570: exon1: c.32C > T: p.P11L genotype in a sample by using a PCR-single strand conformation polymorphism method. The kit comprises primers for amplifying susceptible SNP sites CLIP3: NM-001199570: exon1: c.32C > T: p.P11L, a PCR reagent, a control sample and a reagent required by electrophoresis of detection conformation. The electrophoresis is preferably native polyacrylamide gel electrophoresis. The control sample comprises at least one of a negative control sample of a susceptible SNP locus CLIP3: NM-001199570: exon1: c.32C > T: p.P11L CC homozygote and a positive control sample of a locus TT homozygote, and can also comprise a control sample which does not comprise a heterozygote corresponding to the heterozygote. Preferably, the three types of control samples are included simultaneously. And (3) simultaneously carrying out electrophoresis on the amplification product of the sample to be detected and the amplification product of the control sample, and comparing the electrophoresis results to obtain the detection result of whether the sample to be detected carries the corresponding allelic variation.
Furthermore, the invention provides application of the SNP locus in preparing a device for diagnosing aortic dissection diseases.
Preferably, the diagnostic device is a sequencing chip.
Still further, the present invention provides a method for screening for SNP sites associated with aortic dissection disease, comprising the steps of:
(1) extracting DNA of peripheral blood samples of aortic dissection patients and normal controls;
(2) carrying out ultrasonic fragmentation on DNA, breaking the tail end of a fragment for filling, adding A at the 3' end, connecting an adaptor, and selecting the fragment of 350-400 bp to prepare a whole genome library;
(3) carrying out whole exon detection by using a GenCap liquid phase capture target gene technology; performing double-end sequencing by using a high-throughput sequencer, wherein the read length is 100 bp;
(4) after exon sequencing, performing conventional filtration analysis, and taking hg19 and dbSNP (v147) as reference genome versions as screening filtration standards; leaving pathogenic sites, wherein the frequency of normal people is below 5 percent, and the synonymous mutation pathogenic sites reported in the literature; in order to more accurately screen disease-related sites, SNP (single nucleotide polymorphism) with SIFT (scale-invariant feature transform), Polyphen2, MutationTaster and GERP + + predictions which are all benign is filtered; filtering indels of Mutcount >5, MutRatio > 30% and MAF > 1% in a normal database;
(5) eliminating false positive sites, namely checking and verifying a bam file (rmdup. sorted. bam) subjected to redundancy duplicate removal through IGV software, and regarding the bam file as a false positive site if mutation conditions are inconsistent; and meanwhile, the bam file is checked through samtools software. Then, rare mutation load (RVB) analysis was performed using the mutated gene (a) and the mutated site (b) as markers, respectively, and the number of samples of the gene mutation in the case group and the control group was counted and OR value, OR >1, P <0.01, was calculated. Finding that a plurality of mutation points are statistically associated with AD;
(6) through the annotation analysis of gene functions and related channels and the deep analysis of sequencing results by literature retrieval data, the inventor finally screens out an aortic dissection susceptibility SNP site from 48 mutation sites: CLIP3: NM-001199570: exon1: c.32C > T: p.P11L.
The invention has the beneficial effects that:
by discovering an aortic dissection susceptibility SNP locus and researching the application prospect of the aortic dissection early prediction, the aortic dissection related gene locus provided by the invention can be an early biomarker of the aortic dissection, and provides a new direction for further researching the genetic molecular mechanism of the aortic dissection and exploring the drug target for early aortic dissection prevention and treatment.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention. Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art.
The terms in the present invention are explained as follows:
go (gene ontology), a comprehensive database describing gene Function, can be divided into three parts, Molecular Function (MF), Biological Process (BP), and Cellular Component (CC). The GO is enriched remarkably with p less than 0.05.
EXAMPLE 1 sample Collection
From 59 patients diagnosed with Stanford type a aortic dissection with aortic CTA in shenzhen city grand fugax cardiovascular hospital from 2017 month 1 to 2018 month 4, 2mL of whole blood and 590 normal control samples were collected from the database retrieved by mikeno sequencing company. Informed consent was obtained from the patients and approved by the ethics committee.
Sample treatment: mixing EDTA anticoagulated whole blood with Trizol at a ratio of 1:1, mixing well, placing in a 1.8mL cell freezing tube, rapidly cooling in liquid nitrogen for 30s, and storing in a refrigerator at-80 deg.C.
Example 2 extraction of DNA from blood samples
(1) Adding 1mL of cell lysate into 1mL of blood anticoagulated by EDTA (0.01M, China, Huamei bioengineerer), gently inverting and mixing for 6 times, centrifuging at 3600rpm for 5min, and removing supernatant;
(2) pouring 1mLCL cell lysate into the centrifugal tube again, gently inverting and uniformly mixing for 6 times, centrifuging for 5min at the rotating speed of 3600rpm, and removing supernatant; on the premise of ensuring that the precipitate is kept in the tube, inverting the centrifugal tube on clean absorbent paper and standing for 2 min;
(3) preparing a mixed solution of proteinase K and a buffer FG;
(4) adding 500 mu L of mixed solution of proteinase K and buffer FG, and uniformly mixing until the solution has no lumps;
(5) water bath at 65 deg.C for 30min, and mixing by reversing for several times;
(6) adding 1mL of isopropanol, and then reversing and uniformly mixing until cluster or filiform genome DNA appears;
(7) centrifuging at 3600rpm for 8min, and removing supernatant; on the premise of ensuring that the precipitate is kept in the tube, inverting the centrifugal tube on clean absorbent paper and standing for 2 min;
(8) adding 1mL of 70% ethanol, oscillating for 5sec, centrifuging at 3600rpm for 3min, and removing supernatant;
(9) repeating the step (8);
(10) on the premise of ensuring that the precipitate is kept in the tube, inverting the centrifugal tube on clean absorbent paper and standing for 5 min;
(11) at normal temperature, air-drying the genomic DNA precipitate until all liquid is completely volatilized (at least 5min), oscillating for 5sec, centrifuging for 3min at the rotating speed of 3600rpm, and discarding the supernatant;
(12) adding 200 mu LTB buffer solution, carrying out low-speed vortex oscillation for 5sec, heating in water bath at 65 ℃ for 1h, and flicking to aid dissolution for several times;
(13) the concentration and purity of the extracted genomic DNA was determined using NanoDrop ND8000(THERMO, USA). The concentration is more than 30 ng/mu L, and OD is more than 1.8260/OD280Less than 2.0, and the total amount is more than 3 mug, and the genome DNA sample is regarded as qualified and stored in a refrigerator at-80 ℃ for standby;
(14) on the basis of preliminary DNA quantification, agarose gel electrophoresis detection (gel concentration: 0.8%, voltage: 120V, time: 20min) is further adopted to accurately quantify DNA. If a strong band appears after electrophoresis, the extracted DNA has higher quality. The sequencing of the whole exon requires that the DNA concentration is more than or equal to 50 ng/mu L, the total amount is not less than 5 mu g, and the OD260/280Between 1.8 and 2.0, OD260/230Around 2.0, the sample was free of RNA contamination, no or slight degradation.
Example 3 Whole exon sequencing
1. DNA library construction
Carrying out ultrasonic fragmentation on 3 mu g of DNA, breaking the tail end of the fragment for filling, adding A at the 3' end, connecting an adaptor, and selecting the fragment of 350-400 bp to prepare a whole genome library. Library samples were quality controlled using an Agilent2100 bioanalyzer (Agilent technologies, USA).
2. Target region Capture sequencing
The GenCap liquid phase capture target gene technology (Beijing Mikeno company) is applied to carry out the detection of the whole exon. Mu.g of the DNA library was mixed with BL buffer and probe, heated at 95 ℃ for 7min, heated at 65 ℃ for 2min, 23. mu.L of HY buffer preheated to 65 ℃ was added, and hybridization was carried out at 65 ℃ for 22 h. 50 μ L of MyOne magnetic beads (Life Technology, USA) were washed 3 times with 500 μ L of 1 Xbinding buffer and resuspended in 80 μ L of 1 Xbinding buffer. Add 64. mu.L of 2 Xbinding buffer to the hybridization mixture and transfer to a tube containing 80. mu.L of MyOne magnetic beads. Rotating and mixing for 1 h. The beads were washed with WB1 buffer at room temperature for 15min and with WB3 buffer at 65 ℃ for 3 times, each for 15 min. The bound DNA is then eluted with an elution buffer. The eluted DNA was subjected to PCR reaction under the following conditions: pre-denaturation at 98 ℃ for 30 s; denaturation at 98 ℃ for 25s, annealing at 65 ℃ for 30s, and extension at 72 ℃ for 30s, for 15 cycles; finally, extension is carried out for 5min at 72 ℃. PCR products were purified using SPRI beads (Beckman Co. mu. Lter, USA) according to the product instructions. The enriched library was paired-end sequenced using an illumina HiSeq 2000 sequencer, reading 100 bp.
3. Data analysis
The original sequencing sequence obtained by sequencing contains low-quality reads with connectors, and the interference on subsequent information analysis is caused. To ensure the quality of the information analysis, the raw data was filtered with Cutadapt software, the sequencing data with linker sequences and low-quality bases removed was aligned with Burrows-Wheeler Aligner (BWA) onto the human genome (version GRCh37/hg19), and redundancy was removed with Picard tool. The depth of the target area is calculated using the coverageBed tool. And carrying out realignment on the position close to the indel site through the GATK software so as to improve the quality of the sequence. The GATK HaplotpypeCaller detects SNP and indel Mutation information and annotates the mutations with ANNOVAR, and the annotation Database includes dbSNP147, 1000genes project, exterior sequencing project (ESP6500), Inhouse Database (MyGenetics), GnomAD _ Gene _ EAS. online Mendelian Inheritance Database in Man (OMIM) and Human Gene Mutation Database (HGMD Professional 2016.10). The pathogenicity prediction database includes SIFT, Polyphen-2and MutationTaster, and the pathogenicity is divided by ACMG standard guidelines.
4. Variant filtration
To find potentially pathogenic mutations, the data need to be filtered. Mainly aiming at the mutation of an exon region and a variable splicing region, the first step: leaving behind a pathogenic mutation (pathogenic) site in the pathogenicity assay (pathogenic _ analysis).
Second step (filter screening criteria): screening sites which are not present or less than 5% of five normal human mutation databases of which the mutation base sequencing times are more than 5, the mutation frequency is more than or equal to 30%, 1000g2015apr, ESP6500si, Inhouse, ExAC _ ALL and ExAC _ EAS; synonymous mutation sites in the data set were removed, and additional mutation sites were left as reported in the literature.
5. Statistical analysis
Rare mutation load analysis is carried out on the filtered mutation results, the 590 cases of data of the Mikeno are selected as a control sample, the OR value and the 95% confidence interval are analyzed by utilizing Metabin in the R language, and the p value is considered to be significant when the p value is less than 0.01 through statistical test by using a Mantel-Haenszel's method.
6. Results
After exon sequencing, routine filtration analysis was performed, screening filtration criteria: hg19, dbSNP (v147) were used as reference genome versions. Leaving pathogenic (pathogenic) sites, normal human frequency below 5%, and the reported synonymous mutant pathogenic site, yielding 59351 point mutation. In order to more accurately screen disease-related sites, SNPs (single nucleotide polymorphisms) with SIFT, Polyphen2, MutationTaster and GERP + + predictions of benign are further filtered; filtering indels of Mutcount >5, MutRatio > 30% and MAF > 1% in a normal database; 29979 point mutations were obtained by the above screening.
Then, the inventor carries out the elimination work of false positive sites, firstly, checking and verifying the bam file (rmdup. cleaned. bam) without redundant duties through IGV software, and regarding the inconsistent mutation condition as the false positive site; and meanwhile, the bam file is checked through samtools software. Then, rare mutation load (RVB) analysis was performed using the mutated gene (a) and the mutated site (b) as markers, respectively, and the number of samples of the gene mutation in the case group and the control group was counted and OR value, OR >1, P <0.01, was calculated. As a result, 48 mutation sites were found to be statistically correlated with AD.
Further, the inventors performed annotation of gene functions and related pathways on the 48 mutation sites selected.
A map with the count more than or equal to 2 is selected from the path annotations of the genes with 48 mutation sites, as shown in figures 1 and 2, and the genes with the sites are obviously enriched in the redox biological process in BP analysis.
The results of the screening of 48 mutation sites found that:
CLIP3: NM-001199570: exon1: c.32C > T: p.P11L may be associated with AD.
The inventors performed deep analysis on the sequencing results by combining annotation analysis of gene functions and related pathways and literature search data, and finally screened out the gene CLIP3 mutated in case groups, wherein the mutated SNP sites carried by the gene are shown in Table 1. The CLIP3: NM-001199570: exon1: c.32C > T: p.P11L mutation site is a key molecule for AD, can be used as a novel AD biomarker, and provides an important theoretical basis for guiding clinical early intervention and targeted therapy of the AD.
TABLE 1 aortic dissection SNP site information
Note: gene, Gene name; chr, chromosome; pos, SNP position; AA _ change, amino acid change.
Example 4 identification of SNP sites of genes associated with aortic dissection susceptibility
1. Study object
Aortic dissection: the selected subjects were 25 patients diagnosed with Stanford a-type aortic dissection through aortic CTA at shenzhen, sehenxian cardiovascular hospital, and blood samples were collected.
Control group: the control group of subjects were the population without aortic dissection and medical history in physical examination center of Shenzhen Shenxian Sunxian cardiovascular hospital, 30 subjects in total, and blood samples were collected.
2. Pathological and clinical data
The etiology, the disease process and the treatment course of the patients are recorded and the general clinical data such as age, sex, medical history and the like are registered to establish a database.
The two groups are Chinese population without genetic relationship, region and sex matching, the process conforms to the ethical standard customized by human body test committee and the informed consent of the testee is obtained. The subject is subjected to 10ml of whole blood taking through the cubital vein or residual blood sample after clinical examination is collected, and sodium citrate is anticoagulated for whole genome DNA extraction and serological index detection.
3. Test sample extraction the same as in example 2
4. PCR reactions containing SNP sites
4.1 design of primers
The Primer premier5.0 software was used to design synthetic primers with the following sequences:
a forward primer: 5'-TAAGACAGATCCTGCCCCGA-3' (SEQ ID NO.1)
Reverse primer: 5'-TCACGATCGTTCACATGGCA-3' (SEQ ID NO.2)
Length of amplification product: 336bp (SEQ ID NO.3)
4.2PCR reaction
The PCR reaction system is shown in Table 2. The PCR amplification procedure was: pre-denaturation at 95 deg.C for 10min, denaturation at 94 deg.C for 15s, annealing at 61 deg.C for 15s, extension at 72 deg.C for 30s, performing 30 cycles, final extension at 72 deg.C for 30min, storing at 4 deg.C, and standing overnight at-20 deg.C for freezing.
TABLE 2 reaction System
5. Sequencing
After the PCR amplification is finished, taking 5 mu L of amplification product, carrying out 1% agarose gel electrophoresis, carrying out electrophoresis for 30min, dyeing for 20min, then placing the gel block in a gel imager for observation, and preliminarily judging whether the amplified fragment is correct or not according to the condition of comparing the size of the Marker fragment. And further purifying the amplification product which meets the requirements: the Mag-BindOligonucleotide labeling kit is adopted and operated according to the requirements of the kit. Loading and sequencing: adopting a BigDye3.1Sequeningkit kit of ABI company, and operating according to the requirements of the kit; sequencing was performed using an ABI model 3730 sequencer.
6. Analysis of results
And (3) comparing the sequencing result with the standard sequence through Chromas sequence analysis software, searching for an S NP locus, and analyzing the type of the base at the SNP locus to obtain the genotype of the SNP locus. The results show that: a SNP locus is found at the 27 th site of a sequence shown as SEQ ID NO.3 in the CLIP3 gene, and allele C and allele T exist at the SNP locus, so that three genotypes, C T, CC and TT, are formed. In the nucleotide sequence (shown as SEQ ID NO.3) of 336bp fragments obtained by sequencing 25 cases of aortic dissection patients, the genotypes of the 27 th sites are CT and TT; in the nucleotide sequence (shown as SEQ ID NO.3) of the 336bp fragment obtained by the control sequencing of 30 healthy subjects, the genotype of the allele at the 27 th position is CC; the SNP locus of CLIP3: NM-001199570: ex on1: c.32C > T: p.P11L can be used for auxiliary diagnosis such as detection, treatment, diagnosis, prognosis evaluation and the like of the aortic dissection.
EXAMPLE 5 preparation of early diagnostic kit for predicting aortic dissection disease
Assembling the kit for aortic dissection according to the invention:
the kit comprises a reagent for detecting whether the CLIP3: NM-001199570: exon1: c.32C > T: p.P11L locus is mutated.
The reagent comprises a reagent for amplifying CLIP3: NM-001199570: exon1: c.32C>p.P11L site, which can be designed by software, such as Primer5, Oligo6, etc.; the reagents may also be those commonly used in PCR technology, such as: dNTPs, MgCl2Double distilled water, Taq enzyme, etc., which are well known to those skilled in the art, and in addition, there may be a standard and a control (e.g., a genotype-determining standard and a blank, etc.). The kit has the value that only peripheral blood is needed without other tissue samples, SNP is detected through the simplest and most specific primer pair, and then SNP spectrum is carried outThe aortic dissection auxiliary judgment kit is stable, convenient and accurate in detection, and greatly improves the sensitivity and specificity of disease diagnosis, so that the kit can help to guide diagnosis and more effective individual treatment when put into practice.
Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Sequence listing
<110> Typha
Application of SNP (single nucleotide polymorphism) site of <120> CLIP3 gene
<130> P18073
<160> 3
<170> SIPOSequenceListing 1.0
<210> 1
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
taagacagat cctgccccga 20
<210> 2
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
tcacgatcgt tcacatggca 20
<210> 3
<211> 336
<212> DNA
<213> Homo sapiens
<400> 3
taagacagat cctgccccga tggccccgcc accccgagga gaggaggaag aagaggagga 60
ggaggatgaa cccgtccccg aggcccccag ccccacccag gagcgccggc agaagcctgt 120
tgtgcacccc tcggcacctg cccccctccc taaggactac gctttcacct tcttcgatcc 180
caatgacccg gcgtgccagg agatcctgtt tgaccctcag accaccatcc ccgagctgtt 240
tgccattgtg cgccagtggg tgccccaagt ccagcacaag atagacgtca tcggcaatga 300
gattctgcgc cgaggctgcc atgtgaacga tcgtga 336