AU700405B2

AU700405B2 - Origin of replication complex genes, proteins and methods

Info

Publication number: AU700405B2
Application number: AU13734/95A
Authority: AU
Inventors: Stephen P Bell; Margit Foss; Ira Herskowitz; Ryuji Kobayashi; Patricia Laurenson; Joachim J Li; Francis J McNally; Jasper Rine; Bruce W Stillman
Original assignee: Cold Spring Harbor Laboratory; University of California Berkeley; University of California San Diego UCSD
Current assignee: Cold Spring Harbor Laboratory; University of California
Priority date: 1993-12-16
Filing date: 1994-12-16
Publication date: 1999-01-07
Anticipated expiration: 2014-12-16
Also published as: AU1373495A; CA2178965A1; EP0733057A1; EP0733057A4; JPH09506768A; WO1995016694A1

Description

WO 95/16694 PCT1US94/14563 ORIGIN OF REPLICATION COMPLEX GENES, PROTEINS AND METHODS

INTRODUCTION

The research carried out in the subject application was supported in part by grants from the National Institutes of Health. The government may have rigilts in any patent issuing on this application.

Technical Field The technical field of this invention concerns Origin of Replication Complex genes which are invovied with DNA transcription and replication.

Backaround The elements involved in the early events of eukaryotic DNA replication have begun to emerge in the yeast Saccharoniyces cerevisiae. A critical first step was the identification of ARS elements derived from yeast chromosomes, a subset of which were subsequently shown to act as chromosomal origins of DNA replication (reviewed in 11). Sequence comparison of a number of ARS elements resulted in the identification of the ARS consensus sequence (ACS, 12). This sequence is essential for the function of yeast origins of DNA replication 12, 13). Three additional elements required for efficient ARSi function have been identified. When mutated individually, these elements, referred to as Bi, B2, and B3, result in a slight reduction of ARSI activity. When two or three of the B elements are simultaneously mutated, however, ARS 1 function is severely compromised (14).

j WO 95/16694 PCTIUS94/14563 Proteins that recognize two elements of ARS1 have been identified. The yeast transcription factor ABF1 binds to and mediates the function of the B3 element (11, 14). More recently we have identified a multi-protein complex that specifically recognizes the highly conserved ACS This activity, referred to i 5 as the origin recognition complex (ORC), has several properties that make it an attractive candidate to act as an initiator protein at yeast origins of replication.

K Binding of this protein requires thet ACS, and the effect of mutations in the consensus sequence on ARS1 function parallels the effect of the same mutations on I ORC DNA binding. ORC binds to more than 10 yeast ARS elements, several of which are known origins of DNA replication Specific DNA binding by ORC requires ATP, suggesting that ORC binds ATP, a property of a number of known i initiator proteins ORC also interacts with other sequences outside of the ACS that are known to be important for ARS function (18, 19). Further support for the hypothesis that ORC mediates the function of the ACS is provided by in situ deoxyribonuclease I (DNase I) footprinting experiments that identify a protected region of ARS1 remarkably similar to that observed with ORC in vitro Relevant Literature A multi-protein complex that recognizes cellular origins/,f DNA replication was reported in Bell and Stillman (1992) Nature 357, 128-134. Much of the present disclosure was published by Foss et al. (1993), Bell et al. (1993) and Li and Herskowicz (1993), in Science 262, 1838, 1843 and 1870, respectively, issue date December 17, i993. Wang and Reed (1993) Nature 364, 121-126 report using Sa single-hybrid screen as disclosed herein.

SUMMARY OF THE INVENTION Origin of DNA Replication Complex (ORC) genes, recombinant ORC peptides and methods of identifying DNA binding proteins and using the subject compositions are provided.

Provided are compositions comprising isolated nucleic acids encoding unique ORC gene portions, especially portions encoding biologically active unique portions of ORC1-ORC6 proteins. Vectors and cells comprising such DNA molecules find use in the production of recombinant ORC peptides.

I

4~V~ 4rr ~t4~V WO 95/16694 PCTIUS94/14563 The subject compositions are used to isolate ORC genes from a wide variety of species, including human. The subject ORC peptides also find particular use in screening for ORC selective agents useful in the diagnosis, prognosis or treatment of disease, particulary fungal infections and neoproliferative disease.

Particularly useful are agents capable of distinguishing an ORC protein of an infectious organism or transformed cell from the wild-type human homologue.

Also disclosed are methods for identifying a gene encoding a protein which directly or indirectly associates with a selected DNA sequence. Generally, the methods involve transforming an expression library of hybrid proteins into a reporter strain, wherein the library comprises protein-coding sequences fused to a constitutively expressed transcription activation domain and the reporter strain comprises a reporter gene with at least one copy of a selected DNA sequence in its proioter region. Clones expressing the transcription or translation product of the r,.porter gene are detected and recovered. A preferred method employs an activation domain from GAL4 and a lacZ reporter gene.

BREIF DESCRIPTION OF SEQUENCE ID SEQUENCE ID NO: 1. DNA Sequence of ORCI.

SEQUENCE ID NO:2. Amino Acid Sequence of ORCI.

SEQUENCE ID NO:3. DNA Sequence of ORC2.

SEQUENCE ID NO:4. Amino Acid Sequence of ORC2.

SEQUENCE ID NO:5. DNA Sequence of ORC3.

SEQUENCE ID NO:6. Amino Acid Sequence of ORC3.

SEQUENCE ID NO:7. DNA Sequence of ORC4.

SEQUENCE ID NO:8. Amino Acid Sequence of ORC4.

SEQUENCE ID NO:9. DNA Sequence of SEQUENCE ID NO: 10. Amino Acid Sequence of SEQUENCE ID NO: 11. DNA Sequence of ORC6.

SEQUENCE ID NO: 12. Amino Acid Sequence of ORC6.

NUMBERS

DESCRIPTION OF SPECIFIC EMBODIMENTS The recombinant polypeptides of the invention comprise unique portions of the disclosed ORG proteins which retain an binding affinity specific to the subject Yi---r~lrrarrr~~~~r- WO 95/16694 PCT/US94/14563 full-length ORC protein. A "unique portion" has an amino acid sequence unique to subject ORC in that it is not found in previously known protein and has a length at least long enough to define a peptide specific to that ORC. Unique portions are found to vary from about 5 to about 25 residues, usually from 5 to 10 residues in length, depending on the particular amino acid sequence and are readily identified by comparing the subject portion sequences with known peptide/protein sequence data bases. Hence, the term polypeptide as used herein defines an amino acid polymer with as few as five residues. ORCs used in the subject screening assays are frequently smaller deletion mutants of full-length ORC proteins. Typically, such deletion mutants are readily generated using conventional molecular techniques and screened for an ORC-specific binding affinity using the various assays described below, e.g. footprint analysis, coimmunoprecipitation, etc.

ORC-specific retained binding affinities include the ability to selectively bind a nucleic acid of a defined sequence, an ORC protein or an compound such as an antibody which is capable of selectively binding an ORC protein. As such, binding specificity may be provided by an ORC-specific immunological epitope, lectin binding site, etc. Selective binding is conveniently shown by competition with labeled ligand using recombinant ORC peptide either in vitro or in cell based systems as disclosed herein. Generally, selective binding requires a binding affinity of 10-6M, preferably 10-IM, more preferably 10-'oM, under in vitro conditions as exemplified below.

The subject recombinant polypeptides may be free or covalently coupled to other atoms or molecules. Frequently the polypeptides are present as a portion of a larger polypeptide comprising the subject polypeptide where the remainder of the larger polypeptide need not be ORC-derived. The subject polypeptides are typically "isolated", meaning unaccompanied by at least some of the material with which they are associated in their natural state. Generally, an isolated polypeptide constitutes at least about preferably at least about 10%, and more preferably at least about 50% by weight of the total poly/peptide in a given sample. By pure peptidepolypeptide is intended at least about 60%, preferably at least 80%, and more preferably at least about 90% by weight of total polypeptide. Included in the subject polypeptide weight are any atoms, molecules, groups, etc. covalently WO 95/16694 PCT/US94/14563 coupled to the subject polypeptides, such as detectable labels, glycosylations, phosphorylations, etc.

The subject polypeptides may be isolated or purified in a variety of ways known to those skilled in the art depending on what other components are present in the sample and to what, if anything, the polypeptide is covalently linked.

Purification methods include electrophoretic, molecular, immunological and chromatographic techniques, especially affinity chromatography and RP-HPLC in the case of peptides. For general guidance in suitable purification techniques, see Scopes, Protein Purification, Springer-Verlag, NY (1982).

The polypeptides may be modified or joined to other compounds using physical, chemical, and molecular techniques disclosed or cited herein or otherwise known to those skilled in the relevant art to affect their ORC/receptor binding specificity or other properties such as solubility, membrane transportability, stability, toxicity, bioavailability, localization, detectability, in vivo half-life, etc.

as assayed by methods disclosed herein or otherwise known to those of ordinary skill in the art. Other modifications to further modulate binding specificity/affinity include chemical/enzymatic intervention fatty acid-acylation, proteolysis, glycosylation) and especially where the poly/peptide is integrated into a larger polypeptide, selection of a particular expression host, etc. Amino and/or carboxyl termini may be functionalized for the amino group, acylation or alkylation, and for the carboxyl group, esterification or amidificatiori, or the like.

Many of the disclosed poly/peptides contain glycosylation sites and patterns which may be disrupted or modified, e.g. by enzymes like glycosidases. For instance, N or O-linked glycosylation sites of the disclosed poly/peptides may be deleted or substituted for by another basic amino acid such as Lys or His for Nlinked glycosylation alterations, or deletions or polar substitutions are introduced at Ser and Thr residues for modulating O-linked glycosylation. Glycosylation variants are also produced by selecting appropriate host cells, e.g. yeast, insect, or various mammalian cells, or by in vitro methods such as neuraminidase digestion.

Other covalent modifications of the disclosed poly/peptides may be introduced by reacting the targeted amino acid residues with an organic derivatizing methyl- 3-[(p-azido-phenyl)dithio] propioimidate) or crosslinking agent 1,1bis(diazoacetyl)-2-phenylethane) capable of reacting with selected side chains or. ~ti.i r;* "i a WO 95/16694 PCT/US94/14563 termini. For therapeutic and diagnostic localization, the subject poly/peptides thereof may be labeled directly (radioisotopes, fluorescers, etc.) or indirectly with an agent capable of providing a detectable signal, for example, a heart muscle kinase labeling site.

ORC poypeptides with ORC binding specificity are identified by a variety of ways including crosslinking, or preferably, by screening such polypeptides for binding to or disruption of ORC-ORC complexes. Additional ORC-specific agents include specific antibodies that can be modified to a monovalent form, such as Fab, Fab', or Fv, specifically binding oligopeptides or oligonucleotides and most preferably, small molecular weight organic compounds. For example, the disclosed ORC peptides are used as immunogens to generate specific polyclonal or monoclonal antibodies. See, Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Laboratory, for general methods.

Other prospective ORC specific agents are screened from large libraries of synthetic or natural compounds. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available or readily producible. Additionally, natural and synthetically produced libraries and compounds are readily modified through conventional chemical, physical, and biochemical means. See, e.g. Houghten et al. and Lam et al (1991) Nature 354, 84 and 81, respectively and Blake and Litzi-Davis (1992), Bioconjugate Chem 3, 510.

Useful agents are identified with assays employing a compound comprising the subject polypeptides or encoding nucleic acids. A wide variety of in vitro, cell-free binding assays, especially assays for specific binding to immobilized compounds comprising ORC polypeptide find convenient use. For example, immobilized ORC-ORC or ORC-nucleic acid complexes provide convenient targets for disruption, e.g. as measured by the disassociation of a labelled component of the complex. Such assays are amenable to scale-up, high throughput usage suitable for volume drug screening. While less preferred, cell-based assays may be used to determine specific effects of prospective agents.

Preferred agents are ORC- and species-specific. Useful agents may be found within numerous chemical classes, though typically they are organic compounds; preferably small organic compounds. Small organic compounds have )li WO 95/16694 PCTUS9414563 a molecular weight of more than 150 yet less than about 4,500, preferably less than about 1500, more preferably, less than about 500. Exemplary classes include steroids, heterocyclics, polycyclics, substituted aromatic compounds, and the like.

Selected agents may be modified to enhance efficacy, stability, pharmaceutical compatibility, and the like. Structural identification of an agent may be used to identify, generate, or screen additional agents. For example, where peptide agents are identified, they may be modified in a variety of ways as described above, e.g. to enhance their proteolytic stability. Other methods of stabilization may include encapsulation, for example, in liposomes, etc. The subject binding agents are prepared in any convenient way known to those in the art.

For therapeutic uses, the compositions and agents disclosed herein may be administered by any convenient way. Small organics are preferably administered orally; other compositions and agents are preferably administered parenterally, conveniently in a pharmaceutically or physiologically acceptable carrier, e.g., phosphate buffered saline, or the like. Typically, the compositions are added to a retained physiological fluid. As examples, many of the disclosed therapeutics are amenable to direct injection or infusion, topical, intratracheal/nasal administration e.g. through aerosal, intraocularly, or within/on implants e.g. collagen, osmotic pumps, grafts comprising appropriately transformed cells, etc. Generally, the' amount administered will be empirically determined, typically in the range of about to 1000 g/kg of the recipient. For peptide agents, the concentration will generally be in the range of about 50 to 500 /g/ml in the dose administered.

Other additives may be included, such as stabilizers, bactericides, etc. These additives will be present in conventional amounts.

The invention provides isolated nucleic acids encoding ORC genes, their transcriptional regulatory regions and the disclosed unique ORC polypeptides which retain ORC-specific function. As used herein: an "isolated" nucleic acid is present as other than a naturally occurring chromosome or transcript in its natural state and is typically joined in sequence to at least one nucleotide with which it is not normally associated on a natural chromosome; nucleic acids with substantial sequence similarity hybridize under low stringency conditions, for example, at 0 C and SSC (0.9 M saline/0.09 M sodium citrate) and remain bound when

I

1 Illl~- ~illlllLIIIIIlp ll--~ WO 95/16694 PCT/US94/14563 subject to washing at 55 0 C with SSC, wherein regions of non-identity of substantially similar nucleic acid sequences preferably encode redundant codons; a partially pure nucleotide sequence constitutes at least about 5 preferably at least about 30%, and more preferably at least about 90% by weight of total nucleic acid present in a given fraction; unique portions of the disclosed nucleic acids are of length sufficient to distinguish previously known nucleic acids, hence a unique portion has a nucleotide sequence at least long enough to define a novel oligonucleotide, usually at least about 18 bp in length, preferably at least about 36 nucleotides in length.

Typically, the invention's ORC polypeptide encoding polynucleotides are associated with heterologous sequences. Examples of such heterologous sequences include regulatory sequences such as promoters, enhancers, response elements, signal sequences, polyadenylation sequences, etc., introns, 5' and 3' noncoding regions, etc. According to a particular embodiment of the invention, portions of the coding sequence are spliced with heterologous sequences to produce soluble, secreted fusion proteins, using appropriate signal sequences and optionally, a fusion partner such as -Gal. For antisense applications where the inhibition of expression is indicated, especially useful oligonucleotides are between about 10 and nucleotides in length and include sequences surrounding the disclosed ATG start site, especially the oligonucleotides defined by the disclosed sequence beginning about 5 nucleotides before the start site and ending about 10 nucleotides after the disclosed start site. The ORC encoding nucleic acids can be subject to alternative purification, synthesis, modification, sequencing, expression, transfection, administration or other use by methods disclosed in standard manuals such as Current Protocols in Molecular Biology (Eds. Aufubel, Brent, Kingston, More, Feidman, Smith and Stuhl, Greene Publ. Assoc., Wiley-Interscience, NY, NY, 1992) or that are otherwise known in the art.

The invention also provides vectors comprising the described ORC nucleic acids. A large number of vectors, including plasmid and viral vectors, have been described for expression in a variety of eukaryotic and prokaryotic hosts.

Advantageously, vectors will often include a promotor operably linked to an ORC polypeptide-encoding portion, one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic WO 95/16694 PCT/US94/14563 resistance. The inserted coding sequences may be synthesized, isolated from natural sources, prepared as hybrids, etc. Suitable host cells may be transformed/transfected/infected by any suitable method including electroporation, CaC12 mediated DNA uptake, viral infection, microinjection, microprojectile, or other methods.

Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, and plant and animal cells, especially mammalian cells. Of particular interest are E. coli, B. subtilis, Saccharomyces cerevisiae, SF9 cells, C129 cells, 293 cells, Neurospora, and CHO, COS, HeLa cells, immortalized mammalian myeloid and lymphoid cell lines, and pluripotent cells, especially mammalian ES cells and zygotes. Preferred expression systems include COS-7, 293, BHK, CHO, TM4, CV1, VERO-76, HELA, MDCK, BRL 3A, W138, Hep G2, MMT 060562, TRI cells, and baculovirus systems. Preferred replication systems include M13, ColE1, SV40, baculovirus, lambda, adenovirus, AAV, BPV, etc. A large number of transcription initiation and termination regulatory regions have been isolated and shown to be effective in the transcription and translation of heterologous proteins in the various-hosts. Examples of these regions, methods of isolation, manner of manipulation, etc. are known in the art.

For the production of stably transformed cells and transgenic animals, the subject nucleic acids may be integrated into a host genome by recombination events. For example, such a nucleic acid can be electroporated into a cell, and thereby effect homologous recombination at the site of an endogenous gene, an analog or pseudogene thereof, or a sequence with substantial identity to an ORCencoding gene. Other recombination-based methods such as nonhomologous recombinations, deletion of endogenous gene by homologous recombination, especially in pluripotent cells, etc., provide additional applications. Preferred transgenics and stable transformants over-express or under-express knock-out cells and animals) a disclosed ORC gene and find use in drug development and as a disease model. Methods for making transgenic animals, usually rodents, from ES cells or zygotes are known to those skilled in the art.

The compositions and methods disclosed herein may be used to effect gene therapy. See, e.g. Zhu et al. (1993) Science 261, 209-211; Gutierrez et al. (1992) Lancet 339, 715-721. For example, cells are transfected with ORC-encoding WO 95/16694 PCT/US94/14563 sequences operably linked to gene regulatory sequences capable of effecting altered SORC expression or regulation. To modulate ORC translation, target cells may be transfected w th complementary antisense polynucleotides. For gene therapy involving the grafting/implanting/transfusion of transfected cells, administration i 5 will depend on a number of variables that are ascertained empirically. For example, the number of cells will vary depending on the stability of the transfered Scells. Transfer media is typically a buffered saline solution or other i pharmacologically acceptable solution. Similarly the amount of other administered compositions, e.g. transfected nucleic acid, protein, etc., will depend on the manner of administration, purpose of the therapy, and the like.

The genes encoding six ORC subunits from S. cerevisiae are used to obtain the functional homologues of ORC proteins from other species. For example, Swe have demonstrated that the ORC1 gene is conserved in a related fungi i klyuermyces lactis. The ORC1 gene in both S. cerevisie and k lactis contain conserved primary protein sequence that are utliized to obtain the ORC1 gene from other species including other fungi and from human. Using oligonucleotide I primers based on the conserved sequences between S. cerevisiae and k lactis, PCR is used to identify the ORC1 protein in any eukaryotic species. The cloned gene I encoding ORC1 polypeptide from any fungi or from human cells is used to express the protein in a bacterial expression system to make antibodies against the polypeptide. These antibodies are used to immunoprecipitate the ORC complex from the relevant species. Using the disclosed techniques for protein sequencing, the sequence the ORC polypeptides is obtained. Using the protein sequencing Smethodologies disclosed herein for cloning the S. cerevisiae protein, other genes or cDNAS encoding the ORC polypeptides from other fungi species and from human cells are obtained. As we demonstrate herein how to reconstitute the ORC complex by expressing each of the S. cerevisiae genes in a baculovirus expression vector and infecting Sf 9 insect cells with viruses expressing each of the ORC subunits, these genes are used to express the ORC polypeptides and reconstitute activity. In this way, large amounts of ORC protein from any fungi or mammalian species, including human cells, are obtained.

Inhibitors of ORC protein in fungi provide valuable reagents to selectively inhibit proliferation of fungal cell division by inhibiting the initiation of DNA SWO 95/16694 PCT/US94/14563 replication. This offers a powerful, selective target for antifungal agents valuable in controlling fungal infections in human and other species. For example, as disclosed herein, inhibiting the ORC function by mutation in S. cerevisiae can .actually cause the death of the mutant cells.

In human proliferative disorders such as cancer, cells of the diseased tissue Sundergo uncontrolled cell proliferation. A key event in this cell proliferation is the initiation of DNA replication. Inhibiting the initiation of DNA replication through inhibition of ORC function provides a valuable target for inhibitors of cell growth.

By expressing each of the cDNAS encoding the ORC proteins, either individually or together in an expression system, ORC function is reconstituted in vitro. Using this recombinant, expressed protein, inhibitors of ORC function are obtained that block the initiation of DNA replication in cell cycle. As described above, small molecular inhibitors of ORC DNA binding or other activities provide valuable reagents as anti-cancer and anti-proliferation drugs.

The following examples are offered by way of illustration and not by way of limitation.

EXAMPLES

Example 1.

Transcriptionai silencing and ORC.

SThe binding of purified ORC to the ARS consensus sequence (ACS) at each a of the mating type silencers was tested using a DNase I protection assay (22).

ORC protected the match to the ACS at each of the four silencers in an ATP dependent manner. In addition, at each silencer characteristic hypersensitive sites I 25 of DNAse I cleavage were observed initiating 12-13 bp from the ACS and exteiding away from the consensus sequence at approximately 10 bp intervals.

This pattern of DNase I protection and enhanced cleavage is nearly identical to that observed at non-silencer sequences and indicates that ORC binding to these elements is not fundamentally different from its binding at other ARS elements.

At HML-E, HML-I, and HMR-E the only protection observed included the ACS. At HMR-I, however, we observed a second unexpected footprint that did not overlap a strong match to the ACS. Moreover, unlike all previous sites bound by ORC, this protection showed little dependence upon the addition of ATP to the

I

WO 95/16694 PCT/US94/14563 binding reaction. Although there are two partial matches to the ACS in this region, similar sequences in other ARS elements and silencers were not recognized by ORC, suggesting that these sequences did not direct this unusual ATPindepe.ident binding of ORC to DNA. In combination with the protection observed at the ACS, the boundaries of the ORC footprint at HMR-I were very similar to the boundaries of HMR-I defined by deletion mutagenesis These experiments demonstrate that ORC binds all four of the mating-type silencers, that ORC can bind sequences other than the ACS and that it plays an important role at HML and

HMR.

A clear link between ORC function and transcriptional silencing was provided by the finding that a mutation in a gene encoding a subunit of ORC was defective for repression at HMR (below). To clone the genes encoding the various ORC subunits, peptides derived from each of the ORC subunits were sequenced A candidate gene, referred to as ORC2, was isolated by complementation of a temperature sensitive mutation that showed silencing defects at the permissive temperature. Genetic experiments suggested that ORC2 mediated the silencing function of the ACS at HMR-E, making it a good candidate to encode a subuiit of ORC (below). Comparison of the predicted amino acid sequence of ORC2 showed that all of the peptides derived from the 72 kd subunit of ORC were within the open reading frame of the ORC2 gene indicating that it encoded the second largest subunit of ORC.

ORC2 mutations alter ORC function in vitro.

To address the effect of ORC2 mutations on ORC function in vitro, extracts were prepared from both orc2-1 and ORC2 strains Fractions derived from wild-type cells showed strong ORC DNAse I protection over the ACS and B1 elements of ARS1 in DNAse I footprinting. In contrast, fractions derived from orc2-1 cells showed a dramatic reduction in ORC DNA binding activity. The ACS and the B1 element were no longer protected from DNase I cleavage. Only the characteristic enhanced DNase I cleavages in the B domain of ARS1 remained.

Mutations that disrupt ORC DNA binding at ARS1 prevented the residual DNA binding observed with the mutant fractions, indicating that this binding required the ACS. The DNA binding defects were also not due to a general inhibition of DNA binding as mixing of mutant and wild type fractions did not reduce binding of the WO95169 CTUS4146 l BI''f ,«tlfjalMaii B it -j i

I

wild type protein. Incubation of the mutant cells at the non-permissive temperature was not necessary to observe defects in ORC DNA binding, which explains the defect observed in mating-type regulation at the permissive temperature (below).

To investigate the polypeptide composition of ORC derived from orc2-1 and ORC2 cells, immuno-blots of these fractions were probed with polyclonal antibodies raised against ORC. 30 kg of partially purified ORC derived from either JRY3688 (ORC2) or JRY3687 (orc2-1) was separated on a 10% SDSpolyacrylamide gel and transferred to nitrocellulose. The resulting protein blot was incubated with polyclonal mouse sera raised against the entire ORC complex. This sera detects all but the 50 kd subunit of ORC. Antibody-antigen complexes were detected with horseradish peroxidase conjugated secondary antibodies followed by incubation with a chemiluminescent substrate.

Wild type fractions contained the 120, 72, 62, 56, and 53 kd subunits of ORC in roughly equal quantity. The mutant fractions, however, showed a distinctly different subunit composition. While the amount of the 120 and 56 kd subunits was only slightly reduced relative.to the wild type fraction, the amount of the 72, 62, and 53 kd subunits was reduced dramatically. In UV cross-linking experiments the same three subunits are specifically cross-linked to DNA in an ACS and ATP dependent manner, suggesting an important role for one or more of these subunits in ORC DNA binding Thus, the absence of these subunits explains the defects in DNA binding observed in vitro and indicates that the orc2-1 mutation results in a reduction of ORC stability or a defect in Orc2p also results in reduced DNA binding of an intact ORC complex.

orc2-1 cells are defective for entry into S-phase.

The point in the cell cycle the essential function of ORC2 is performed in vivo was investigated using alpha factor and hydroxyurea (HU) as cell cycle landmarks Our results were consistent with the execution of the essential function of Orc2p between late G1 and the initiation of DNA synthesis. Arrest with HU followed by release into the non-permissive temperature resulted in 89% of the cells completing an additional cell cycle, indicating that the essential function for Orc2p was executed before the HU arrest point in the cell cycle. In contrast, blocking the cell cycle with alpha-factor followed by release at the non-permissive temperature resulted in the only 41% of the cells completing an additional cell

I

WO 95/16694 PCTIUS94/14563 cycle. This phenotype indicates that the Orc2p function was performed at or near the G1-S phase boundary.

To address the role of ORC in yeast DNA replication more directly, the DNA content of asynchronous cultures of either orc2-1 or isogenic wild type cells was measured at various times after shifting from the permissive to the nonpermissive temperature by fluorescent cytometric analysis JRY3687 (orc2-1) or JRY3688 (ORC2) cells grown at 24°C (0 minute time point) or at various times after shifting to the non-permissive temperature (37 0 C) were fixed, stained with propidium iodide, and analyzed for DNA content using a Coulter Model Epics-C Flow Cytometer. In addition, a small number of cells (approximately 1000) from each time point were returned to the permissive temperature to determine the percentage of cells that remained viable at a given time point. Initially, the DNA content of both wild type and mutant cells was equally divided between 1C and 2C with approximately 10% of the cells in S phase. At early time points after the temperature shift (15-70 minutes) there was a dramatic loss of orc2-1 cells in Sphase suggesting that entry into S-phase had been halted. Consistent with this hypothesis, as the time course continued the orc2-1 mutant showed a rapid accumulation of cells with a 1C DNA content and a commensurate decrease in Scells with a 2C DNA content (50-100 minutes). Between 100 and 120 minutes, a new population of orc2-1 cells was observed that appeared to enter into a delayed S phase. By 150 minutes the bulk of the mutant cells were in this population and after 180 minutes only a few cells remained with a 1C DNA content.

Interestingly, we observed a strong correlation between entry into the new round of DNA synthesis and a loss of orc2-1 cell viability. Similar experiments with isogenic ORC2 cells showed that these effects were specific to the orc2-1 mutation. These findings indicate that at the non-permissive temperature the orc2- 1 cells were initially unable to enter S phase, but later entered into an abortive round of DNA replication. Entry into this type of replication appears to be a lethal event. Overall, the analysis of the orc2-1 mutation provides in vivo evidence showing that ORC acts early in S-phase in general, and as the initiator protein at yeast origins of replication in particular.

Identification of the ORC6 gene.

14

A

V

I

V

:1t2 an~bep I -I PCTIUS94114563 0 95/16694 A second gene that represented a strong candidate to encode one of the subunits of ORC was the AAPI gene. This gene was cloned using a novel screen for proteins that bound to the ACS in vivo (below). When compared to the predicted amino acid sequence of this gene, we found that all of the peptides derived from the 50 kd subunit of ORC were encoded by the open reading frame of the AAP1 gene For this reason we now refer to AAP1 as ORC6 as it encodes the smallest of the six ORC subunits. The identification of this gene as a subunit of ORC provides direct evidence that ORC is bound to the ACS in vivo.

Numbered Citations for Introduction and Example 1 1. Callan, Cold Spring Harbor Symp. Quant. Biol. 38, 195-203 (1973).

2. Fangman and Brewer, Cell 71, 363-366 (1992).

3. P. Laurenson and J. Rine, Micro. Rev. 56, 543-560 (1992).

4. D. D. Dubey, et al., Mol. Cell. Biol. 11, 5346-5355 (1991).

D. H. Rivier and J. Rine, Science 256, 659-663 (1992).

6. A. M. Miller and K. A. Nasmyth, Nature 312, 247-251 (1984).

7. L. Pillus and J. Rine, Cell 59, 637-647 (1989).

A. Axlerod and J. Rine, Mol. Cell. Biol. 11, 1080-1091 (1991).

9. J. L. Campbell and C. S. Newlon, in The Molecular and Cellular Biology of the Yeast Saccharomyces J. R. Broach, J. R. Pringle, E. W. Jones, Eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1991) pp. 41-146.

J. Broach, et al., CSH Symp. Quant. Biol. 47, 1165-1173 (1983).

11. Deshpande and Newlon, Mol. Cell. Bio 12, 4305-4313 (1992).

12. Y. Marahrens and B. Stillman, Science 255, 817-823 (1992).

13. S. P. Bell and B. Stillman, Nature 357, 128-134 (1992).

14. Kornberg Baker, DNA Replicat'n (Freeman Co, NY,1991) v2.

C. S. Newlon, Microbiol. Rev. 52, 568-601 (1988).

16. Newlon and Theis, Current opinion in gen. and dev. 3, (1993).

17. J. F. X. Diffley and J. H. Cocker, Nature 357, 169-172 (1992).

18. Jacob, et al., CSH Symp. Quant. Biol., 28, 329-348 (1963).

19. DNAse I footprinting was performed as previously described J. B. Feldman,et al., J. Mol. Biol. 178, 815-834 (1984).

21. To obtain sufficient protein for peptide sequencing, a revised purification procedure for ORC was devised. Whole cell extract was prepared from 400g of r 1 WO 95/16694 PCT/US94/14563 frozen BJ926 cells using a bead beater (Biospec Products) until greater than breakage was achieved. One twelfth volume of a saturated (at 4°C) solution of ammonium sulfate was added to the broken cells and stirred for 30 minutes. This solution was then spun at 13,000 x g for 20 minutes. The resulting supernatant was spun in a 45Ti rotor (Beckman) at 44,000 RPM for 1.5 hrs. 0.27g/ml of ammonium sulfate was added to the resulting supernatant. and the resulting precipitate was collected by spinning in the 45 Ti rotor at 40,000 RPM for minutes. The resulting pellet was resuspended in buffer H/0.0 (15) and dialyzed versus H/0.15M KC1 (H with 0.15 M KC1 added). Preparation of ORC from this extract was similar to (15) with the following changes. The dsDNA cellulose column was omitted from the preparation and only a single glycerol gradient was performed. Sequencing of peptides derived from ORC subunits was performed using a modification of an "in gel" protocol described previously (40, 41).

Purified ORC 10 jg per subunit) was separated by SDS-PAGE and stained with 0.1% Coomassie Brilliant Blue G (Aldrich). After destaining the gel was soaked in water for one hour. The protein bands were excised, transferred to a microcentrifuge tube and treated with 200 ng of Achromobacter protease I (Lysylendopeptidase: Wako). The resulting peptides were separated by reversephase chromatography and sequenced by automated Edman degradation (Applied Biosystems model 470).

22. To isolate and assay ORC from ORC2 and orc2-1 cells four liters of JRY3687 (orc2-1, MATa,.hmrDA::TRP1 ade2 his3 leu2 trpl ura3) or the isogenic wild-type strain JRY3688 (ORC2 MATa, hmrDA::TRP1 ade2 his3 leu2 trpl ura3) were grown to a density of 2 x 107 cells per ml. Extracts were prepared as described (24) and fractionated over the first two columns in the preparation of ORC. The peak fraction of ORC DNA binding activity eluted from the Q- Sepharose (Pharmacia) column of each preparation was used for subsequent analysis. Antibodies were raised against the entire ORC complex using a single mouse. The resulting sera was able to recognize all but the 50 kd subunit of ORC.

Proteins were transferred to nitrocellulose and antigen-antibody complexes were detected with horse radish peroxidase conjugated secondary anitbodies and a chemiluminescent substrate.

16 i WO 95/16694 PCTIUS94/14563 23. Yeast cells were grown to a density of 1-4 x 10 7 cells per ml at 24°C then diluted to a density of 2-4 x 106 cells per ml into YPD containing 6 pM alphafactor and incubated for 2-2.5 hours at 24°C 90% unbudded cells). For the hydroxyurea arrest experiments alpha factor was washed away and the cells were resuspended in YPD containing 100 mM hydroxyurea and incubated an additional hours 90% large budded cells). After incubation with the growth inhibitor, cells were briefly sonicated and plated on YPD plates pre-incubated at either 24°C or 37°C and observed at 0, 3, and 6 hours after plating.

24. Yeast cells were grown to a density of 1-4 x 107 cells per ml at 24°C and diluted into fresh YPD at either 37 0 C or 24°C and a density of 2-4 x 10 6 cells per ml. At times after dilution, 3 x 10 6 cells were processed as described (42).

The position of the five peptides derived from the 50 kd subunit of ORC in the ORC6 gene were residues: 51-65; 91-102; 110-105; 207-226; 424-430.

26. K. M. Hennessy, et al, Genes Dev. 4, 2252-2263 (1990).

27. H. Renauld, et al., Genes Dev. 7, 1133-1145 (1993).

28. A. H. Brand, G. Micklem, and K. Nasmyth, Cell 51, 709-719 (1987).

29. McNally and Rine, Mol. and Cell. Biol. 11, 5648-5659 (1991).

D. D. Brown, Cell 37, 359-365 (1984).

S 31. A. P. Wolffe, J. Cell Sci. 99, 201-206 (1991).

32. D. Kitsberg, et al., Nature 364, 459-463 (1993).

33. K. S. Hatton, et al., Mol. Cell. Biol. 8, 2149-2158 (1988).

34. V. Dhar, et al., Mol. Cell. Biol. 9, 3524-3532 (1989).

L. G. Edgar and J. D. McGhee, Cell 53, 589-599 (1988).

36. L. P. Villarreal, Micro. Rev. 55, 512-542 (1991).

37. H. Kawasaki, et al., Anal. Biochem. 191, 332-336 (1990).

38. H. Kawasaki and K. Suzuki, Anal. Biochem. 186, 264-268 (1990).

39. R. Nash, et al., EMBL Journal 7, 4335-4346 (1988).

J. Abraham, et al., J. Mol. Biol. 176, 307-331 (1984).

41. D. T. Stinchcomb, et al., Nature 282, 39-43 (1979).

Example 2.

ORC2, a gene required for viability and silencing 17 WO 95/16694 PCTIUS94/14563 In a mutant screen, a temperature-sensitive mutation called orc2-1 was isolated that, at the permissive temperature, resulted in derepression of HMRa flanked by the synthetic silencer and did not cause derepression of HMRa flanked .by the wild-type silencer Because the orc2-1 mutant was temperaturesensitive and silencing defective, it merited further analysis. The temperature resistance of a heterozygus orc2-1/ORC2 diploid (JRY2640) established that the mutation was recessive. The diploid was transformed with a plasmid containing HMRa flanked by a mutant silencer (pJR1212), to provide MATal function required for sporulation. The temperature-sensitive growth phenotype segregated 2 ts 2 wild type in each of 23 tetrads, indicating that it was caused by a single nuclear mutation. An HMLa matal HMRa orc2-1 segregant (JRY3683) was obtained from the diploid following sporulation.

Genetic crosses were used to determine which features in the wild-type silencer distinguished it from the synthetic silencer with respect to derepression by orc2-1. A matal HMRca strain (JRY3683) containing the orc2-1 mutation was mated to a MATa strain containing a mutation in the RAP1 binding site of HMR-E flanking HMRa (the HMRa-e-rapl-10 allele; 5401-la) to determine whether orc2-1 could derepress HMRa in the absence of a functional RAP1 binding site. All 29 of the 96 MATa segregants that had little or-no mating ability were temperaturesensitive for growth. Nineteen of.the MATo temperature-sensitive segregants were mating competent, indicating that the orc2-1 mutation per se was insufficient to disrupt mating ability, and suggesting that the HMRa-e-rapl-10 allele was required in combination with orc2-1 to block mating ability of a strains. A MATo temperature-sensitive segregant from this cross, which mated weakly as an a (JRY4133), was confirmed to have the genotype MATo HMRa-e-rapl-10 orc2-1.

As further evidence that orc2-l in combination with blocked the mating ability of MATa strains, a somewhat unusual cross was used to simplify the previous cross by having orc2-l as the only relevant heterozygous marker. Two MATa HMRa-e-rapl-10 strains (JRY4133 and JRY4132) had complementary auxotrophic markers, allowing for the selection of the rare MATa/MATae diploid formed by a mating event between these two strains. This diploid was able to sporulate due to the low level of expression of HMRa in the diploid caused by the RAP1-site mutation in the HMR-E silencer One of

V

WO 95/16694 PCT/US94/14563 these strains had the orc2-l mutation (JRY4133) and the other did not. As expected, the temperature sensitivity segregated 2:2 in each of 34 tetrads. All of the temperature-resistant segregants (two per tetrad) exhibited the a mating phenotype, and all of the temperature-sensitive segregants were either very weak a-maters or were unable to mate at all. The absence of any recombinants between the temperature sensitivity and mating phenotype placed the gene(s) responsible for the temperature sensitivity and the mating defect less than 1.5 centimorgans apart, Sproviding strong evidence that a lesion in a single gene was responsible for both phenotypes. This result was in agreement with the co-reversion of the ts and mating phenotypes described herein.

Isolation of multiple alleles of ORC2 Using the information from this analysis of orc2-1, a second screen was performed to identify additional mutations in essential genes with a role in silencer function. This second screen produced 50 mutants that were temperature sensitive for growth, and in which HMRa (flanked by a mutation in the RAPI-binding site) was derepressed at a semi-permissive temperature. Complementation tests for both growth at 37 0 C and for mating phenotype were performed between orc2-1 and the collection of temperature-sensitive mutants from the second screen. The collection of temperature sensitive mutants had the matal stel4 genotype, but were able to inate as a's due to the derepression of HMRa. These mutants were mated to a matal orc2-1 strain (JRY3683) and the diploids were tested for growth at 37 0

C.

All but three diploids were able to grow at the restrictive temperature. The three j temperature-sensitive diploids were each presumed to be orc2/orc2 homozygotes due to the inability of the two mutations to complement one another. The mating type of the diploids was checked to determine whether the defect in repression of HMR was complemented. All three diploids mated as a's. Thus, the three mutants were unable to complement either the temperature sensitivity or the mating phenotype of the original orc2-1 mutation. The new mutations (in strains JRY4136, 4137 and 4138) were designated orc2-2, orc2-3, and orc2-4.

To investigate the possibility that the new mutations were in a gene other than ORC2 yet still failed to complement orc2-1, the allelism between orc2-1 and orc2-3 was tested. The original matal orc2-3 stel4 mutant was cured of its HMRa plasmid, creating JRY 4137, and mated with a MATa 19 11 I WO 95/16694 PCTIUS94/14563 orc2-1 strain (JRY3685). In 24 tetrads from this diploid, all segregants were temperature sensitive for growth, indicating strong linkage between orc2-1 and orc2-3 centimorgans). All further studies were performed using the orc2-1 allele, which provided the stronger mutant phenotypes.

Map position of ORC2 Linkage between ORC2 and LYS2, on chromosome II, was evident in crosses between two lys2 strains (JRY2640 and PSY152) and the original orc2-1 isolate (JRY2903) that placed ORC2 approximately 24 centimorgans from LYS2.

A third cross (JRY4130 x JRY4134) tested the linkage between secl8, which is centromere proximal to LYS2, and ORC2. Because both orc2-1 and secl8 are temperature sensitive, an ORC2 allele marked by URA3 (from pJR1423) was used to determine that SEC18 and ORC2 were separated by 6.6 centimorgans (Table 1).

No previously-mapped genes involved in silencing map near SEC18.

Table 1. Linkage of ORC2 to LYS2 and ORC2 to SEC18 Tetrad types Map distance Cross PD T NPD (cM) SORC2 vs LYS2 10 14 0 29 ORC2 vs LYS2 20 14 0 21 ORC2 vs LYS2 TOTAL 30 28 0 24 ORC2 vs SEC18 46 7 0 6.6 The ORC2 mutants arrested with a cell cycle terminal phenotype.

The effect of the orc2-1 mutation on the cell division cycle was explored: mutant orc2-1 strains were grown in liquid medium at 23 0 C, the permissive temperature, and then shifted to 37 0 C to test whether the cells arrested with a single terminal morphology. Specifically, orc2-1 cells (JRY3683) were grown to log phase at the permissive temperature (23 0 C) and the culture was split. Half of the culture was grown an additional five hours at the permissive temperature and the other half was shifted to the nonpermissive temperature (37 0 C) and grown for an additional five hours. At that time, both cultures were fixed and stained with DAPI to allow visualization of the nucleus. In the culture maintained at the permissive temperature, cells at all phases of the cell cycle were observed. Cells

I

WO 95/16694 PCT/US94/14563 later in the cell cycle, as evidenced by the presence of large buds, frequently exhibited nuclei in both the mother and the daughter cell. In contrast, in the culture shifted to the restrictive temperature, approximately 90% of the cells arrested as large budded cells. Nuclei were only present in the mother cell and not in the daughter cells. In addition, the cells were larger than those grown at the permissive temperature, indicating that protein synthesis and cell wall synthesis continued in the absence of ORC2 function. Similar results were obtained with two additional orc2-1 strains (JRY3685 and JRY3687).

ORC2 cells harvested either after continuous growth at the permissive temperature or after a shift to the nonpermissive temperature were fixed and stained with DAPI allowing visualization of DNA with fluorescence microscopy.

The cells grown permissively displayed a range of morphologies from small unbudded cells to cells with single buds of various sizes. The cells shifted to the nonpermissive temperature looked very different: the majority arrested as large budded cells, and for the most part, each mother-daughter pair contained only a single brightly-staining region, often at or near the neck. These data indicated that orc2-1 mutants displayed cell cycle defects characteristic of mutants defective in DNA replication.

Cloning of the ORC2 gene: The ORC2 gene was cloned by complementation of the orc2-1 temperature sensitivity One complementing clone (pJR1416) was chosen for further analysis. Subclones missing various fragments from the insert were retransformed into an orc2 strain to assay whether the deletion affected the clone's ability to complement orc2-1's temperature sensitivity. The key observations were that the deletion of a 2.8-kb SstI-SstI fragment destroyed complementation activity, whereas the deletions of flanking sequences (XbaI, and the largerSstI fragment) had no effect. The 2.8-kb fragment was subcloned (pJR1263), and shown to possess complementing activity.

To determine whether the gene on the clone was indeed allelic to the ORC2 mutation, a fragment of the original clone was subcloned into a yeast integrating vector. This plasmid (pJR1423) was cleaved within the insert to direct homologous integration and transformed into a wild-type strain (W303-1A). As a result, the r WO 95/16694 PCT/US94/14563 site of integration was marked by the plasmid's URA3 gene. The resulting strain (JRY4134) was crossed to an orc2-1 strain (JRY3685). In each of 59 tetrads, URA3 segregated opposite to the temperature sensitivity caused by orc2-1, indicating that ORC2 had indeed been cloned.

ORC2 was essential for cell viability.

ORC2 was disrupted by URA3, and integrated into a diploid homozygous for ura3 and ORC2, (JRY3444). Of the 41 tetrads dissected, tetrads had two live and two dead segregants, and one tetrad had only one live segregant. The colonies that grew were, without exception, Ura-. By inference, the dead segregants contained the URA3 gene, and thus the ORC2 disruption, indicating that ORC2 function was essential for cell viability at all temperatures.

The dead segregants were examined under a microscope to gain some insight into the true null phenotype. Most of the spores germinated into cells that were elongated or otherwise deformed and had not divided. In no case did the cell divide more than two times. Thus in many spores, the absence of ORC2 blocked cell division but not growth.

Role of ORC2 in Plasmid Replication To test the role of ORC2 in plasmid stability, an isogenic pair of strains, one.wild type (W303-1B) and one orc2-1 (JRY4125), were transformed with a plasmid containing a centromere, a suppressor tRNA (SUP11-1), URA3, and ARS1, a chromosomal origin of replication (YRP14/CEN4/ARS1/ARS1; (24, selecting for uracil prototrophy. Transformants were grown on selective medium at 23°C, the permissive temperature for orc2-1. The colonies were picked from the selective plate, serially diluted, plated onto solid rich medium and grown to colonies at 23°C. The wild-type transformants grew into colonies most of which were white with a few exhibiting red sectors. The small fraction of red colonies were from cells in the selectively grown colony that had lost the plasmid. In contrast, the majority of colonies from the orc2-1 mutant were red, reflecting a high degree of plasmid loss among the cells in the selectively grown colony.

Moreover, in the orc2-1 strain, red sectors were present in the majority of white colonies with some white colonies displaying multiple red sectors.

It is possible to quantitate the number of cell cycles in which a plasmid is lost from the number of colonies that are half red and half white. Only those Ti i WO 95/16694 PCTIUS94/14563 colonies that lose the plasmid in the first cell division form half red, half white colonies. In the case of the wild-type strain, 0.9 (10 1168) of the colonies were half red and half white, indicating that the plasmid was lost in 0.9 of cell cycles. In contrast, the frequency of half red and half white colonies in the orc2-1 strain grown at the permissive temperature was 11% (58 512), indicating that the same plasmid was lost approximately 12 times as often in the strain with partially defective Orc2p. These data indicated a profound defect in plasmid stability specific to the orc2-1 strain, and in combination with the cell-cycle phenotype of orc2-1, suggested that orc2-1 strains were defective in DNA replication. These results were consistent with the flow cytometry studies of orc2-1 strains herein.

Sequence of ORC2 The sequence of the 2.8-kb Sstl-Sstl orc2-complementing fragment was determined and deposited in Genbank (Accession #L23924). The only open reading frame of significant length was deduced to be ORC2, and predicted a 620 residue protein of approximately 68 kD. The SstI fragment included 806-bp of upstream sequence and 140-bp of downstream sequence.

The deduced Orc2p protein was 15% basic residues and 16% serine/threonines. Fully 50% of the N-terminal residues (residues 15-280) were lysine, arginine, proline, serine, or threonine. The KeyBank motif program 'revealed several matches to peptide motifs within Orc2p. Orc2p contained many potential phosphorylation sites: 3 for cAMP- and cGMP-dependent protein kinase (starting at residues 57, 433 and 546), 12 for protein kinase C (24, 41, 42, 89, 101, 102, 176, 321, 335, 431, 521, and 549) and 14 for caseine kinase 1I (60, 148, 149, 182, 238, 270, 389, 481, 486, 491, 505, 552, 595, and 605), and match to the nuclear targeting sequence (residues 103-107). A perfect match to the RAPI binding site consensus (starting at nucleotide 595), and two near matches (12/15) to the ABFl-binding consensus sequence (starting at 12 and 609). It was determined by sequence homology that a lysyl tRNA synthetase gene is located to the left of the Sstl fragment shown here (Mirande and Waller, 1988), and a kinase homolog to the right.

Another homolgy is with the region near the catalytic domain of human topoisomerase I proteins which has diverged among topoisomerase I proteins from other species except for the region surrounding the invariant active-site tyrosine.

WO 95/16694 PTU9I46 PCTIUS94114563 This region includes a consensus sequence consisting of a serine and lysine residue near the tyrosine The Orc2p protein also contained such a consensus sequence near its C-terminus. However, mutation of this putative active-site tyrosmne to phenylalanine had no detectable effect on the ability of ORC2 to complement the temperature-sensitivity or mating defect of an orc2-1 strain.

Table 2. Strain list.

Strain DBY1034 W303-1A Genotype (a) MATa his4-S39 lys2-801 ura3-S2 SUC MATa ade2-1 can 1-100 his3-JJJS5 leu2-3,112 trpl-1 ura3-1 W303-IB MATbe ade2-1 cani-IQO his3-11,15 leu2-3,112 trpl-1 ura3-1 PSY152 MATa his3D200 leu2-3,112 lys2-801 ura3-S2 JRY4130 MATu hi4 =r3 sec18 JRY438 MA Ta Gal' his4-S2'9 leu2-3,112 SUC2 ura3-S2 JRY543 MA Ta/MATh ade2-1 01/ade2-1 0117is3A200hiis3LA200 lys2-801/lys2-801 niet2/ME72 TYRJ/tyr] ura3-S2/ura3-S2 JRY2640 matal ade2 leu2-3,112 lys2-801 =r3 JRY2698 MATa HMRae ade2-1O1 Wi3 leu2 trpi ura3-S2 JRY2699 MATu HMRa ade2-101 Wi3 leu2 trpl ura3-S2 sir4DN.111S3 JRY2700 MATa HMRu ade2-1O1 his3 leu2 trp] ura3-S2 pJR924 JRY2903 MA Ta HMRae ade2-1O1 Wi3 leu2 orc2-1 trpl ura3-52 JRY2904 MATat HMRci ade2-101 Wi3 L_90 orc2-1 trpl ura3-52 pJR924 JRY3444 MA Ta/MATh ade2-1O1/ade2-1O1 his3D200/his3D200 lys2-801ys2-8Olmet2/MET2 TYR]Iyri ura3-S2/ura3-S2 orc2::Tnl OL UK/ORC2 JRY3683 matal {HMRa}) ade2 his3 leu2 or2-lura3 JRY3685 MATax HMRa-e-rapl-10 ade2 leu2 trpl orc2-1 ura3 JRY3687 MATa hnirDA::TRP1 ade2 Wmi3 !Cu2 trPl ura3 orc2-1 "MOMW -494 WO 95/16694 PCTIUS94/14563 JRY3690 MATa HMRa-e-rapl-1O ade2 his3-11,1S leu2 orc2-1 trpl ura3 JRY4125 MATa ade2-1 can1-JX0 his3-11,i.5 leu2-3,112 orc2-1 trpl-1 ura3-1 JRY4132 MATu HMRa-e-rapl-1O ade2 his3 ura3 JRY4133 MATct HMRa-e-rapl-1O ade2 leu2 orc2-ltrpl ura3 JRY4134 MATa ade2-1 cani-QO his3-11,IS leu2-3,112 trpl-1 ura3-1 ORC2.:pJR 1423 JRY4135 matal ade2 leu2-3,112 lys2-801 ura3 stel74 JRY4136 matal ade2 16u2-3,112 lys2-801 orc2-2 ura3 stel74 JRY4137 matal ade2 leu2-3,112 lys2-801 orc2-3 ura3 stel14 JRY4138 matal aJe2 leu2-3,112 lys2-801 orc2-4 ura3 stel4 Unless otherwise noted, all strains were HMLat and HMRa. HMRa-erapi-lO refers to the allele of HMR-E, originally described as that contains a mutation in the RAPI binding site (21).

Numbered Citations for Example 2.

1. 1. Herskowitz, et al Cold Spring f~arbor Laboratory Press 583 (1992).

2. J. Abraham, J. Feldman, K;A. Nasmyth, J.N. Strathern, J.R. Broach, and J. Hicks, C.S.H. Symp. Quant. Biol. 47, 989 (1982). J.B. Feldman, J.B. Hicks, and J.R. Broach, J. Mul. Bio. 178, 815 (1984).

3. J. Rine, and 1. Herskowitz, Genetics 116, 9 (1987).

4. Kurtz et al, Genes Dev. 5, 616 (1991); Sussel et al, PNAS 88, 7749 (1991).

5. J.R. Mullen, et al, PMBO J, 8, 2067 (1989).

6. P.S. Kayne, et al, Cell 55, 27 (1988). L.M. Johnson, et al, Proc. Natl.

Acad. Sci. USA 87, 6286 (1990). P.D. Megee, et al, Science 247, 841 (1990). E.

Park, and J. Szostak, Mol. Cell. Biol. 10, 4932 (1990).

7. P. Laurenson, and J. Rine, Microbiol. Rev. 56, 543 (1992).

8. Brand, et al., Cell 41, 41 (1985); Kimmerly, et al., EMBO J. 7, 2241 (1988).

9. D. Shore, and K. Nasmyth, Cell 51, 721 (1987).

SWO 95/16694 PCT/US94/14563 M.S. Longtine, et al., Curr. Genet. 16, 225 (1989).

11. A.R. Buchman, et al, Mol. Cell. Biol. 8, 5086 (1988).

12. J.F.X. Diffley, and J.H. Cocker, Science 357, 169 (1992).

13. A.S. Buchman, and R.D. Kornberg, Mol. Cell. Biol. 10, 887 (1990).

14. J.A. Huberman, et al, Nucleic Acids Res.16, 6373 (1988). B.J. Brewer, and W.L. Fangman, Cell 51, 463 (1987).

S.P. Bell and B. Stillman, Nature 357, 128 (1992).

16. F.J. McNally, and J. Rine, Mol. Cell. Biol. 11, 5648 (1991).

17. A.M. Miller, and K.A. Nasmyth, Nature 312, 247 (1984).

18. D.H. Rivier, and J. Rine, J. Science 256, 659 (1992).

19. Two genetic screens were devised to identify temperature sensitive mutations in essential genes involved in silencing. The screen that led to isolation of orc2-1 started with JRY2698 (HMLa, MATa, HMRa, ade2, his3, leu2, trpl.

ura3-52), which had a mating-type cassettes at all three chromosomal mating-type loci and was transformed with a plasmid (pJR924) containing the a mating-type cassette at HMR (JRY2700). The plasmid-borne HMRa locus had two synthetic silencers substituted for the E silencer, and also had a deletion of the I element.

The use of two silencers rather than one minimized the risk of being distracted by site mutations in the silencer. One hundred and sixty two thousand colonies of EMS-mutagenized colonies were grown on supplemented minimal media (without uracil) at 25°C and screened for derepression of the plasmid-borne a cassette at HMR. Mutagenized colonies were replica-plated onto lawns of the mating tester strain DBY1034 (MATa, his4-539, lys2-801, ura3-52) on minimal media either with or without uracil supplementation. Replicas were incubated at 25°C for one hour, then overnight at 30°C. Only plasmid-containing JRY2700 cells were able to mate with the tester strain to yield diploids capable of growing on the unsupplemented plates because the only functional URA3 gene was on the plasmid.

Cells bearing mutations causing derepression of the plasmid-borne a cassette could be distinguished from the other classes of mutations by exploiting a feature of yeast plasmids. Approximately 10% of the cells in these colonies lacked the plasmid and thus could, in principle, mate with the tester strain and form Ura" diploids capable of growth on the plates supplemented with uracil. If a colony had a mutation in the mating response pathway, the cells would be unable to mate even 1 ~1_1 WO 95/16694 PCT/US94/14563 in the absence of the plasmid, and thus would be unable to form diploids capable Sof growth on medium supplemented with uracil. Twenty eight strains were identified that were temperature-sensitive for growth and that mated with the tester strain only on plates supplemented with uracil. Plasmid-free isolates of each strain were then retransformed with the plasmid bearing the synthetic silencer at the HMRa locus (pJR924) and with the plasmid bearing the wild-type HMRa locus (pJR919; McNally and Rine, 1991). Three strains were able to mate when carrying the wild-type HMR locus (pJR919) but not when carrying the synthetic silencer-containing HMR locus (pJR924). In order to determine if the ts growth phenotype and the mating phenotype were due to the same mutation, spontaneous revertants of the ts phenotype were selected. A spontaneous revertant of the ts growth of one strain, JRY2904, mated as well as the wild-type JRY2700, suggesting that the mating phenotype and temperature-sensitive growth were due .o the same mutation which was named orc2-l.

20. Y. Kassir, et al, Genet. 109, 481 (1985). Foss and Rine, Genetics. (1993) j 21. The ORC2 gene was cloned by complementation of the temperature sensitivity ot orc2-1. An orc2-1 strain (JRY3683) was transformed with a CEN LEU2-based Saccharomyces cerevisiae genomic library (32) Approximately 1000 to 1500 transformants formed colonies at 23 0 C. Replica prints of these colonies were incubated at 37 0 C to screen for the ability to grow at elevated temperatures.

Plasmids were isolated from temperature-resistant strains and retested. Those plasmids that complemented the defect a second time were analyzed by restriction digestion. One plasmid from the CEN-LEU2 library (pJR1416) was chosen for further analysis.

22. ORC2 was disrupted with the TnlO LUK transposon which inserted within the ORC2 coding sequence on the plasmid (pJR1146) carrying the SstI orc2- 1 complementing fragment. Plasmid pJR1147 had the TnlOLUK insertion within the ORC2 coding region. The ORC2-containing SstI fragment, disrupted by the transposon, was removed from pJR1147 by partial digestion with SstI. The fragment was transformed into the wild-type diploid JRY543. The integration of this disruption allele at the ORC2 locus was confirmed by DNA blot hybridization analysis (Southern, 1975), and the diploid was named JRY3444.

23. P. Hieter, C. Mann, M. Snyder, and R.W. Davis, Cell 40, 381 (1985).

I

27 WO 95/16694 PCT/US94/14563 24. D. Koshland, J.C. Kent, and L.H. Hartwell, Cell 40, 393 (1985). R.M.

Lynn, et al, Proc. Natl. Acad. Sci. USA 86, 3559 (1989). Eng,S.D.

Pandit, and R. Sternglanz, J. Biol. Chem. 264, 13373 (1989).

26. A.H. Brand, G. Micklem, and K. Nasmyth, Cell 51, 709 (1987).

27. S. Shuman, et al, Proc. Natl. Acad. Sci. USA 86, 9793 (1989).

28, 29. J. Singh, and A.J.S. Klar, A. J. S. Genes and Dev. 6, 186 (1992).

D.D. Dubey, et al, Mol. Cell. Biol. 11, 5346 (1991).

31. C.A. Hrycyna, et al, EMBO J. 10, 1699 (1991).

32. A mutation was introduced into the RAP1 binding site at HMR-E adjacent to the HMRao locus by oligonucleotide-directed mutagenesis and the change confirmed by sequencing. The RAP1 site mutation was identical to the PAS1-1 mutation of HMR-E characterized previously that blocks RAP1 protein binding in vitro and is described here as HMRc-e-rapl-10. The plasmid consisting of the HMRo-e-rapl-10 HindIII fragment in pRS316 was named pJR1425. The wildtype HMRa version of the same plasmid was named pJR1426. Approximately 100,000 mutagenized cells from 12 independent cultures of the HMLac matal HMRa stel4 strain with the HMRa plasmid (pJR1425) were grown into colonies at 23°C and replica-plated to a MATa ura3 mating-type tester lawn (PSY152) to "identify mutants exhibiting the a mating phenotype. The mating plates were incubated at 30 0 C in order to identify mutants defective enough to be derepressed at HMR yet not so defective as to be inviable. Of nine hundred haploid mating proficient colonies that were picked, fifty mutants were temperature sensitive for growth at 37 0 F to some degree. These mutants were subjected to further study and the remainder were discarded. All 50 mutants were recessive to wild-type. Only the subset of mutants relevant to ORC2 are presented here; the remainder will be discussed elsewhere.

33. The ORC2 gene was defined by the orc2-1 mutation. An orc2complementing plasmid (pJR1416) was obtained by complementation of the temperature sensitivity of orc2-1. In order to map the approximate position of the orc2 -complementing gene in the plasmid, six derivatives of pJR1416 were made and tested for complementation. The Sail-Sail fragment was removed from the insert to yield pJR1418. Three adjacent XbaI-XbaI fragments were removed to

T

WO 95/16694 PCT/US94/14563 yield pJR1422. SphI cleaved once in the insert and once just inside the vector.

Deleting this Sphl-SphI fragment produced pJR1417. Cleavage by SstI released two fragments from the insert. Deletion of both fragments created pJR1419.

Isolates in which only the larger SstI fragment (pJR1421) or only the smaller SstI fragment (pJR1420) was deleted were also recovered. The 2.8-kb SstI-SstI orc2complementing fragment was cloned into the SstI site of the CEN URA3 vector pRS316 to yield pJR1263. Two plasmids were made which allowed the chromosomal integration of part or all of ORC2. The first, pJR1423, contained an XhoI/KpnI insert (from pJR1416) which extended from a few kb upstream of the ORC2 start codon to about 60-bp upstream of the stop codon inserted into XhoI- KpnI-cut pRS306 a yeast integrating vector marked by URA3. The second plasmid, pJR1424, contained the SstI orc2-complementing fragment inserted into the SstI site of pRS306.

34. F. Spencer, et al Genetics 124, 237 (1990).

35. 0. Huisman, et al, Genetics 116, 191 (1987).

36. E.M. Southern, J. Mol. Biol. 98, 503 (1975).

37. T.A. Kunkel, et al, Methods Enzymol. 154, 367 (1987).

38. R.S. Sikorski, and P. Hieter, Genetics 122, 19 (1989).

Example 3.

In order to identify potential yeast initiators, we developed a genetic strategy, the one-hybrid system, to find proteins that recognize a target sequence of interest. The one-hybrid system has two basic components: a hybrid expression library, constructed by fusing a transcriptional activation domain to random protein segments, and (ii) a reporter gene containing a binding site of interest in its promoter region. Hybrid proteins that recognize this site are expected to induce expression of the reporter gene, because of their dual ability to bind the promoter region and activate transcription This association may be indirect, since hybrids that interact with endogenous proteins already occupying the binding site will also activate transcription Nevertheless, as long as the association is sequence-specific the protein incorporated in the hybrid should be functionally relevant.

WO 95/16694 PCT/US94/14563 We have used this method to look for proteins from the yeast Saccharomyces cerevisiae that recognize the ARS consensus sequence (ACS) of yeast origins of DNA replication. The protein component of this screen was provided by a set of three complementary yeast hybrid expression libraries, YL1-3, containing random yeast protein segments fused to the GALA transcriptional activation domain (GAL4AD) The reporter gene for our screen contained four direct repeats of the ACS in its promoter region and was integrated into the yeast strain GGY1 to form JLY363(ACSW" To determine the dependence of lacZ induction on the ACS, we constructed in parallel JLY365(ACS M L UrAr), which harbors a reporter gene carrying four copies of a nonfunctional multiply-mutated ACS (Fig. 4) We isolated nine plasmids that induced greater lacZ activity in JLY363(ACS w than JLY365(ACS M AN T from a screen of 1.2 million YL1-3 transformants Many of the plasmids that induced lacZ activity on initial screening of the library in JLY363(ACS w failed to exhibit a dependence on the ACS when introduced into JLY365(ACSMrA N T Restriction analysis of these plasmids showed that the nine isolates represented five genomic clones, which we initially labeled AAP1-5 for ACS associated protein. AAPI was isolated four times, twice, and the others only once.

To examine the sequence specificity of lacZ induction with finer resolution, reporter constructs containing direct repeats of four ACS point mutants were each integrated into GGY1 to generate the set of reporter strains(10). The five AAP clones were individually examined in these strains for the ability to induce lacZ expression. AAP1 displayed a correspondence between the induction of this set of reporter genes and the ARS function (12) of their ACS. The AAP5 hybrid exhibited a slightly weaker correlation, and the remaining clones showed poor correlation. These findings suggest that AAP1, and possibly AAP5, encodes a protein that recognizes the ACS in a sequence-specific manner. Constructs with deletions in the AAP1 coding sequence (14) were unable to induce lacZ expression, indicating that recognition of the ACS resided in the protein segment fused to GAL4.

The genomic segments fused to the GAL4A D in AAP1-5 were sequenced to determine the extent of the hybrid proteins that were made. AAP1 and WO 95/16694 PCT/US94/14563 had sizable protein coding sequences of 301 and 123 amino acids, respectively, fused in frame with the GAL4AD In principle, these segments are large enough to direct the hybrid protein to the promoter of the reporter gene. AAP2-4 encoded hybrid proteins with only short peptide extensions (10, 22, and 38 amino acids respectively) fused to the GAL4AD, suggesting that these hybrids were not responsible for the transcriptional induction attributed to these clones. Because of this finding and the lack of proper sequence specificity for the ACS eler ait, AAP2-4 were not studied further.

The full-length gene for AAPI was cloned from a yeast genomic library and sequenced (15) (Genbank accession no. L23323). AAP1 contains an open reading frame for a protein 435 amino acids long with a predicted molecular weight of 50,302 daltons. The hybrid GAL4AD-AAP1 protein obtained from the screen was a fusion of the GAL4AD to the C-terminal two-thirds of the predicted full-length protein (residues 135-435) indicating that this portion of the molecule is sufficient for association with the ACS. Comparison of peptide sequences from the 50kd subunit of ORC with the predicted protein sequence from AAP1 demonstrated that our gene encodes this subunit and confirmed the association between the AAP1 protein and the ACS. Because of this identity, we have renamed our gene ORC6.

An overlapping ORF capable of encoding a protein 250 amino acids long exists on the complementary strand. The positions of the predicted start and stop codons for this ORF are at nt 1615-7 and nt 865-7, respectively. In pJL766 the C residue at 1471 was mutated to a T, preserving the amino acid sequence of ORC6 but introducing a stop codon in this overlapping ORF. The sequence of ORC6 indicates a connection with the regulatory machinery governing cell cycle progression. Orc6p contains four phosphorylation sites, (S/T)PXK, for cyclindependent protein kinases (20) clustered in the first half of the molecule. Using the more relaxed consensus site (S/T)P adds two more sites to this cluster. We have observed Orc6p phosphorylated in vivo on serine and threonine residues.

However, since the initiation of yeast DNA replication commences promptly in "i response to the activation of this protein kinase in G we believe that Orc6p and possibly other ORC subunits are regulated substrates of this kinase. Finally, as expected for a protein participating in nuclear events, Orc6p contains a potential 31 WO 95/16694 PCT/US94/14563 nuclear localization signal (NLS) within the (S/T)PXK cluster and one in the Cterminal domain (amino acid residues 117-122 and 263-279). Orc6p can be seen in the nucleus by immunofluoresence.

A marked deletion of the ORC6 gene (pJL731) removing all but 13 codons from its open reading frame, was introduced into diploids from three different strain backgrounds. The resulting heterozygous ORC6 deletion strains, JLY481, JLY475, and JLY469 were induced to undergo meiosis, and 20 tetrads of each strain were dissected In all backgrounds the ORC6 disruption cosegregated with inviability, demonstrating that ORC6 is essential for cell growth.

Microscopic examination revealed that mutant spores from JLY481 and JLY475 germinated, completed 1-2 rounds of cell division, and then arrested with a uniform large bud morphology reminiscent of cell division cycle mutants defective in DNA replication or nuclear division The position of cell cycle arrest could not be pinpointed, however, since the DNA content of these cells could not be readily measured. Mutant spores derived from JLY469 germinated poorly.

The interpretation of these ORC6 deletion experiments was complicated by the presence.of a second open reading frame (ORF2) of 250 amino acids on the antisense strand of the ORC6 gene. ORF2 spans nucleotides 1617 to 868 of the Genbank sequence and overlaps the C-terminal two-thirds of the ORC6 coding sequence. A marked deletion that removed the N-terminal third of the ORC6 coding sequence without affecting ORF2 (pJL733) was introduced into diploids Tetrad analysis again showed the ORC6 deletion cosegregating with cell death. Finally, an ORC6 gene was constructed that contains a silent codon change for the ORC6 ORF but introduces a UGA stop codon in ORF2 This gene was able to rescue a haploid strain containing a full deletion of the ORC6 ORF.

We conclude that ORC6 is essential for cell viability.

Our results validate the one-hybrid system screen as a method to identify and clone genes for proteins that recognize a DNA sequence of interest. This screen has also been successful in identifying DNA-binding proteins and a variation of this screen has been used to identify a binding site for a suspected DNA-binding protein The one-hybrid approach is particularly useful for proteins that are difficult to detect biochemically or for which starting material in a purification is difficult to obtain.

32 "44 WO 95/16694 PCT/US94/14563 We identified genes that interact genetically with ORC6 using established cdc mutants because germinating spores bearing an ORC6 deletion appeared to exhibit a cell division cycle phenotype. pJL749 a plasmid that overexpresses Orc6p several hundred-fold, was introduced into a virtually isogenic set of temperature-sensitive cdc mutants arresting at various points in the cell cycle (29).

Overexpression of ORC6 selectively affected cdc6 and cdc46 mutants, lowering their restrictive temperature by 5-7° C; there was no significant effect on the other mutants examined or on the wild-type strain (Table 1).

viability with Strain cdc mutant overexpression of ORC6 RDY488 wild-type RDY501 cdc28-1 RDY510 cdc4-l RDY664 cdc34-2 RDY543 cdc7-4 JLY310 cdc6-1 JLY179 cdc46-1 JLY338 cdc2-1 JLY353 cdc 17-1 RDY619 cdcl5-2 Table 1. Viability of cdc Mutants in the Presence of High Levels of ORC6 Expression. JL749 (GALp-HA-ORC6), JL772 (GALp-HA), and RS425 were introduced into each cdc mutant, and examined for growth at various temperatures under conditions that induce expression of ORC6 (28, 29). indicates mutants whose restrictive temperature remains unchanged in the presence of JL749 relative to JL772 and RS425. indicates mutants whose restrictive temperature is lowered 5-7° C when JL749 is present.

33 I II WO 95/16694 PCTIUS94/14563 Numbered Citations for Example 3 1. Kelly, J. Biol. Chem. 263, 17889 (1988); Marians, Annu. Rev. Biochem.

61, 673 (1992); Korberg, Baker, DNA Replication. (Freeman and Company, New York, 1992); B. Stillman, Annu. Rev. Cell Biol. 5, 197 (1989).

2. M. L. DePamphilis, Annu. Rev. Biochem. 62, 29 (1993).

3. Campbell and Newlon, in The Molecular and Cellular Biology of the Yeast Saccharomyces Broach, et al, Eds. (CSHL Press, 1991), vol. 1, pp. 41-146.

4. Fangman and Brewer, Annu. Rev. Cell Biol. 7, 375 (1991).

J.R. Broach et al., Cold Spring Harbor Symp. Quant. Biol. 47, 1165 (1983); Van Houton and C. S. Newlon, Mol. Cell. Biol. 10, 3917 (1990).

6. Y. Marahrens and B. Stillman, Science 255, 817 (1992).

7. S. Fields and Song, Nature 340, 245 (1989); Chien, P.T.

Bartel, R. Sternglanz, S. Fields, Proc. Natl. Acad. Sci. USA 88, 9578 (1991).

8. R. Brent and M. Ptashne, Cell 43, 729 (1985).

9. The N-terminal portions of the hybrids from hree related hybrid expression libraries, YL1-3 consist of the SV40 nuclear localization signal and amino acids 768-881 of the GAL4 activation domain (GALIAD). The C-terminal portions were derived from random yeast protein segments which have been fused to the end of the GAL4AD. These segments are encoded by short (1-3kb) fragments from a Sau3a partial digest of yeast genomic DNA. Together, YL1-3 ensure that all three reading frames of these fragments can be expressed.

pLR1D1 is described in R.W. West Jr., R.R. Rogers, M. Ptashne, Mol.

Cell. Biol. 4, 2467 (1984). We generated pBgl-lacZ from pLR1D1 by (i) substituting an XhoI-BglII-XhoI polylinker for the XhoI linker and (ii) precisely excising a Hind III fragment containing 2m sequences. The resulting vector has a unique Bgl II site approximately 100 bp upstream of the TATA box for insertion of DNA sequences in the promoter region and a unique Stul site for targeted integration of the plasmid at the URA3 locus. Multiple direct repeats of ARS1 domain A and several of its mutant derivatives were inserted into the Bgl II site of pBgl-lacZ to generate all the reporter genes used in this work. The inserted repeat elements, derived from complementary oligonucleotides, were oriented with the TATA box to their right. Each reporter gene construct was integrated into the 34

I^

WO 95116694 PCTIUS94/14563 URA3 locus of GGY1 (MATa Dgal4 Dgal80 ura3 leu2 his3 ade2 tyr) Gill and M. Ptashne, Cell 51, 121 (1987)] to create a reporter strain. Integration of pBgllacZ into GGY1 generated JLY387.

11. YEPD (rich complete) and SD (synthetic dropout) media are as described Hicks and I. Herskowitz, Genetics 83, 245 (1976)]. Standard methods were used for manipulation of yeast cells Guthrie and G.R. Fink, Ed., Guide to Yeast Genetics and Moleculat Biology (Academic Press, San Diego 1991)] and DNA Ausubel et al., Ed., Current Protocols in Molecular Biology (Wiley, New York 1989)]. Libraries YL1-3 were transformed Schiestl and R.D.

Geitz, Current Genetics 16, 339 (1989)] into JLY363 (10) and plated on SD-Leu at a density of 2-5000 colonies/10cm plate. 500,000 transformants were obtained for YL1 and YL2, and 200,000 for YL3. Transformants were assayed on filters for production of b-galactosidase Breeden and K. Nasmyth, Cold Spring Harbor Symp. Quant. Biol. 47, 643 (1985)]. 49 isolates remained positive after colony purification (15 from YL-1; 22 from YL-2, 12 form YL-3), and library plasmids were extracted from them These plasmids were each transformed into both JLY363 and its mutant counterpart JLY365 Nine plasmids induced greater b-galactosidase activity in the wild type reporter strain than the control. These plasmids were classified into five clones, AAP1-5, based on their Hind III restriction pattern. Each clone was then retested in JLY360, JLY361, JLY387, JLY429, JLY431, JLY433, JLY435. The AAPI hybrid clone was called pJL720.

The AAPI gene was later renamed ORC6.2 12. The ARS function of the mutant sequences was analyzed in the context of ARS1 domain B (BglII-Hinfl fragment, nt 853-734) in the following CEN-based URA3-containing plasmids: pJL347 pJL243 (multiple), pJL326 (A863T), pJL338 (T869A), pJL330 (T862C), and pJL316 (T867G). These plasmids were transformed into JLY106 (MATa ura3 leu2 his3 trpl lys2 ade2) and its homozygous diploid counterpart JLY162. pJL243, pJL3't, and pJL338 did not yield a high frequency of transformation and could not be assayed quantitatively for ARS function. pJL347, pJL330, and pJL316 transformed cells with high efficiency and were assayed for mitotic stability [Stinchcomb, et al. Nature 282, 39 (1979)].

WO 95/16694 PCT/US94/14563 13. pJL720, the ORC6 hybrid construct originally isolated from the YL3 library, has two BamHI sites. The 5' site created by the hybrid junction corresponds to Sau3a site at nt. 843. Excision of the segment between the two sites generated pJL721, leaving amino acids 339-435 in frame with the GAL4AD.

pGAD3R (11) the parent vector for the YL3 library, contains no ORC6 sequence.

pRS425, Christianson, et al., Gene 110, 119 (1992), contains no components of the fusion protein.

14. All sequencing was performed with Sequenase (USB) on collapsed doublestranded templates. The protein coding segments of the AAP1-5 hybrid clones were sequenced from their junction with the GAL4AD to their stop codon. Two of the ORC6 sequencing primers were used as colony hybridization probes to screen a high copy number yeast genomic library Carlson and D. Botstein, Cell 28, 145 (1982)] for a clone of the full-length ORC6 gene (pJL724). The full-length gene was sequenced on both strands using oligonuclotide primers positioned approximately 200 nt apart.

S. P. Bell and B. Stillman, Nature 357, 128 (1992).

16. Hodgman, Nature 333, 22 (1988);Walker et al., EMBO J. 1, 945 (1982).

17. P. Linder, et al., Nature 337, 121 (1989).

18.. E. A. Nigg, Seminars in Cell Biology 2, 261 (1991).

19. ORC6 deletions were constructed by replacing nucleotides 458-1721 (pJL731) or nucleotides 458-846 (pJL733) of the Genbank sequence with the URA3 HindIII fragment oriented in the opposite direction to that of the ORC6 sequence.

Each construct was used to generate heterozygous deletions of ORC6 in diploid strains by one-step gene replacement. ORC6 deletion analysis was performed in JLY461 (MATa/MATa ura3/ura3 leu2/leu2 his3/his3 trpl/trpl ade2/ade2 [cif]), JLY462 (MATa/MATa ura3/ura3 leu2/leu2 trpl/trpl his4/his4 canl/cani), and JLY463 (MATa/MATa ura3/ura3 leu2/leu2 trpl/trpl his3/HIS3); their respective genetic backgrounds are S288c, EG123, and A364a. Disruption of JLY461, JLY462, and JLY463 by pJL731 (full deletion) created JLY481, JLY475, and JLY469, respectively. Disruption of JLY461, JLY462, and JLY463 by pJL733 (N-terminal deletion) created JLY485, JLY479, JLY473, respectively. These 36 WO 95/16694 PCT/US94/14563 heterozygous marked deletion strains were sporulated, and twenty tetrads of each were dissected and grown on YEPD to assess viability.

Pringle and Hartwell, in The Molecular Biology of the Yeast Saccharomyces Strathern, et al, Eds. (CSHL Press, CSH, 1981), vol. 1, pp. 97-142.

21. A point mutant (pJL766) was made by replacing the BamHI-SphI fragment of the full-length clone with a BamHI/SphI fragment generated by PCR from pJL720 using primers. One mutation changes nucleotide 1471 of the Genbank sequence from C to T and was confirmed by seqUence analysis.

22. M. M. Wang and R. R. Reed, Nature 364, 121 (1993).

23. T. E. Wilson, et alt, Science 252, 1296 (1991).

24. J. F. X. Diffley and J. H. Cocker, Nature 357, 169 (1992).

pJL749 contains the GAL] promoter (nt 146-816) driving the expression of ORC6 (nt 443-2298) in the high-copy yeast shuttle vector RS425 W.

Christianson, et al., Gene 110, 119 (1992)].

26. The cdc mutant strains have been backcrossed 4-5 times against two congenic strains derived from A364a RDY487 (MATa leu2 ura3 trpl) and RDY488 (MATa leu2 ura3 trpl). All are ura3 leu2 trpl. RDY510, RDY664, JLY310, and JLY179 are MATa; the rest are MATa. Additional markers can be found in JLY310(ade2), RDY543(his3), and RDY619 (pep4D::TRP1 his3 ade2).

pJL749, pJL772, and RS425 (28) were transformed into these strains and plated on SD-LEU at 220 C. Four colony-purified isolates from each transformation were patched onto SD-LEU plates and replica-plated to SGAL-LEU plates, all at 220 C.

The patches on SGAL-LEU were replica-plated to a series of pre-warmed SGAL- LEU plates at 220, 250, 270, 300, 32.50, 350, 37°, and 380 C. The viability of cdc mutants containing pJL749 was compared to those containing pJL772 and pRS425.

27. Hartwell, JMB 104, 803 (1976); Hennessy, et al G&D 4, 2252(1990).

28. Chen, et al., PNAS 89, 10459 (1992); Hogan, et al, ibid. 89, 3098.

29. B.J. Andrews and S.W. Mason, Science. 261, 1543 (1993).

Example 4. Orc protein purification and gene cloning Protein Purification: To obtain sufficient protein for peptide sequencing, a revised purification procedure for ORC was devised, based on the procedure reported previously (Bell and Stillman, 1992). Whole cell extract was WO 95/16694 PCT/US94/14563 prepared from 400g of frozen BJ926 cells (frozen immediately after harvesting a 300 liter logarithmically growing culture, total of 1.6 kg per 300 liters). All buffers contained 0,5 mM PMSF, 1 mM benzamidine, 2 mM pepstatin A, 0.1 mg/ml bacitracin and 2mM DTT. 400 mis of 2X buffer H/0. 1 N (100 mM Hepes-KOH, pH 7.5, 0.2 M KC1, 2 mM EDTA, 2 mM EGTA, 10 mM Mg Acetate, and 20% glycerol) was added to the cells and after thawing the cells were broken using a bead beater (Biospec Products) until greater than 90% cell breakage was achieved (twenty 30 second pulses separated by 90 second pauses). After breakage is complete, the volume of the broken cells was measured and one twelfth volume of a saturated (at 4°C) solution of ammonium sulfate was added and stirred for 30 minutes. This solution was then spun at 13,000 x g for 20 minutes. The resulting supernatant was transferred to 45Ti bottle assemblies (Beckman) and spun in a 45Ti rotor at 44,000 RPM for 1.5 hrs. The volume of the resulting supernatant was measured and 0.27g/ml of ammonium sulfate was added. After stirring for 30 minutes, the precipitate was collected by spinning in the 45 Ti rotor at 40,000 RPM or 30 minutes. The resulting pellet was resuspended using a Bpestle dounce in buffer H/0.0 (50 mM Hepes-KOH, pH 7.5, 1 mM EDTA, 1 mM EGTA, 5 mM Mg Acetate, 0.02% NP-40, 10% glycerol) and dialyzed versus H/0.15M KC1 (Buffer H with 0.15 M KC1 This preparation typically yielded 12-16 g soluble protein (determined by Bradford assay with a bovine serum albumin standard). Preparation of ORC from this extract was essentially as described (Bell and Stillman, 1992) with the following changes (column sizes used for preparation of ORC from 400g of cells are indicated in parenthesis). The S- Sepharose column was loaded at 20 mg protein per ml of resin 300 ml). The Q-Sepharose (50 ml) and sequence specific affinity column (5ml) was run as described but the dsDNA cellulose column was omitted from the preparation.

Only a single glycerol gradient was performed in an SW-41 rotor spun at 41,000 RPM for 20 hrs. We estimate a yield of 130 /tg of ORC complex (all subunits combined) per 400 g of yeast cells.

Protein Sequencing: Digestion of ORC subunits was performed using an "in gel" protocol described by Kawasaki and Suzuki with some modification.

Briefly, purified ORC 10 14g per subunit) was first separated by 10% SDS- PAGE and stained with 0.1% Coomassie Brilliant Blue G (Aldrich) for 15 min.

pC~Xj-L-r;s :-r WO 95/16694 PCT/US94/14563 After destaining (10% methanol, 10% aceic acid), the gel was soaked in water for one hour, then the protein bands we;- excised, transferred to a microcentrifuge tube and cut into 3-5 pieces to fit snugly into the bottom of the tube. A minimum volume of 0.1M Tris-HCI (pH=9.0) containing 0.1% SDS was added to completely cover the gel pieces. Then 200 ng of Achromobacter protease I (Lysylendopeptidase: Wako) was added and incubated at 30 0 C for 24 hrs. After digestion the samples were centrifuged and the supernatant was passed through an Ultrafree-MC filter (Millipore, 0.221xm). The gel slices were then washed twice in 0.1% TFA for one hour and the washes were recovered and filtered as above.

All filtrates were combined and reduced to a volume suitable for injection on the HPLC using a speed-vac. The digests were separated by reverse-phase HPLC (Hewlett-Packard 1090 system) using a Vydac C18 column 1x 250 mm, 300 angstroms) with an ion exchange pre-column (Brownlee GAX-013, 3.2x The peptides were eluted from the C-18 column by increasing acetonitrile concentration and monitored by their absorbance at 214, 280, 295, and 550 nm.

Amino acid sequencing of the purified peptides was performed on an automated sequencer (Applied Biosystems model 470) with on-line HPLC (Applied Biosystems model 1020A) analysis of PTH-amino acids.

ORC SUBUNIT CLONING: ORC1: To clone the gene.for the largest (120 kd) subunit of ORC, the following degenerate oligonucleoide primers 1201 and 1202 were synthesized based on the sequence of the first ORC1 peptide. These oligos were used to perform PCR reactions using total yeast genomic DNA from the strain W303 a as target.

A 48 base pair fragment was specifically amplified. This fragment was subcloned and sequenced. The resulting sequence encoded the predicted peptide indicating that it was the correct amplification product. A radioactively labeled form of the PCR product was then used to probe a genomic library of yeast DNA sequences resulting in the identification of two overlapping clones. Sequencing of these clones resulted in the identification of a large open reading frame that encoded a protein with a predicted molecular weight of 120 kd and that encoded all four of the ORC1 peptide sequences.

ORC3: To clone the gene for the 62 kd subunit of ORC, the following degenerate oligonucleoide primers 621 and 624 were synthesized based on the 1.! WO 95/16694 PCT/US94/14S63 sequence of the third peptide. These oligos were used to perform PCR reactions using total yeast genomic DNA from the strain W303 a as target. A 53 base pair fragment was specifically amplified. This fragment was subcloned and sequenced.

The resulting sequence encoded the predicted peptide indicating that it was the correct amplification product. A radioactively labeled form of the PCR product was then used to probe a genomic library of yeast DNA sequences resulting in the identification of two overlapping clones. Sequencing of these clones resulted in the identification of a large open reading frame that encoded a protein with a predicted molecular weight of 71 kd and encoded all three of the ORC3 peptide sequences. The inconsistency of the molecular weight is presumably due to X janomalous migration of this protein during SDS-PAGE.

ORC4: By comparing the sequnce of the ORC4 peptides to that of the known potentially protein encoding sequnces in the genbank database we found that a portion of the ORC4 coding sequence had been previously cloned in the process of cloning the adjacent gene. Using the information from the database we were able to design a perfect match oligo and use this to immediately screen a yeast library. Using this oligo as a probe of the same yeast genomic DNA library a lambda clone was isolated that contained the entire ORC4 gene. This gene encoded a protein of predicted molecular weight 56 kd and also all of the peptides derived from the peptide sequencing of the 56 kd subunit.

ORCS: To clone the gene for the 53 kd subunit of ORC, the following degenerate oligonucleoide primers 535 and 536 were synthesized based on the sequence of the first ORC5 peptide. These oligos were used to perform PCR reactions using total yeast genomic DNA from the strain W303 a as target. A 47 base pair fragment was specifically amplified. This fragment was subcloned and sequenced. The resulting sequence encoded the predicted peptide indicating that it was the correct amplification product. A radioactively labeled form of the PCR product was then used to probe a genomic library of yeast DNA sequences resulting in the identification of a single lambda clone. Sequencing of this clones resulted in the identification of a large open reading frame that encoded a several of the peptide sequences derived from the 53 kd subunit of ORC indicating that this was the correct gene. However the sequence of the 5' end of the gene wasno present in this lambda clone. Fortuitoulsy, the mutations in the same gene had also

MENNNIMEN

S WO 95/16694 PCT/US94/14563 been picked up in the same sreen that resulted in the identification of the ORC2 gene. A complementing clone to this mutation was found to overlap with the lambda clone and contain the entire 5' end of the gene. Sequencing of this complementing DNA fragment resulted in the identification of the entire sequence of the ORC5 gene.

All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: COLD SPRING HARBOR LABORATORY THE REGENTS OF THE UNIVERSITY OF CALIFORNIA (ii) TITLE OF INVENTION: ORC GENES, RECOMBINANT ORC PEPTIDES AND METHODS OF IDENTIFYING DNA BINDING PROTEINS (iii) NUMBER OF SEQUENCES: 12 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: FLEHR, HOHBACH, TEST, ALBRITTON HERBERT STREET: 4 Embarcadero Center, Suite 3400 CITY: San Francisco STATE: California COUNTRY: USA ZIP: 94111-4187 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible 4 OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.25 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE:

CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION: NAME: Osman, Richard A I(B) REGISTRATION NUMBER: 36,677 REFERENCE/DOCKET NUMBER: FP--59032-PC/RAO (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (415) 781-1989 t.

WO 95/16694 WO 9516694PCT[US94/14563 TELEFAX: (415) 398-3249 TELEX: 910 277299 INFORM4ATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 4940 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (1i) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: ATAACATGCT CGCCCTTTTA I CTTCTATTTA TTAGTTTTAT ATGGTATGGA GTGTATAATG CCAAAAATTT ACCAAGAAAA AAATATCCGG TACATTCTAT CAGTATTAAG ATAAGGACTG TAACTTGGCG CCAAATTAGA TGTCTCATCG TTATATCAAC CTCTTCCTCG ACTTATTTTT CAACAACATT GAGAAGAGAT TTCAGCCTAA AAGTTTCCAG

AGAGTAGATA

AATAACAACT GATGAGCAGG AGGTGCAAAA ACTGAACATT TGATAGTGTA GTCATGCACA GTTGAGACTT AATACATTAA GTTTGAAGTC AATCCTTTAG TCCATCA AGAAAGAG GGCACAA2ATT TCAGACGCAG TGAAGATGAA TCATCTGATT

TATTATGACA

CTTTTAATTG

GTTTATAATT

AAAATTAAGA

TACAAATATG

CTATGGGGCA

AAAGATATAA

AAATATT'.CA

TATTAACGTT

GAAGTTGCGC

AAGCAGGAAC

AGATGGCAAA

GAAkATATAAT

ACTTAAAGAG

ACGAAGCCGC

ATAATGTTGT

CTCATTATAG

ATAAACTGTT

AATTGCAGCT

TATTGAAAGG

CTGGGGAGAA

AGCCAAGGGA

TCAAAAGAGG

AAACAAGAGC

ATGAAAGTCC

GAAAGAATAT

ATGATGTGTC

TCCCCTAAGA

ATACTACACA

TT'TGTACAAT

TTTTTTGTCT

ACCTCAAATA

CCAACGAACA

GACACGGCCA

AAAGGGAAAG

TCATTCCCTA

AACGTTGAAG

CGATGGAGGT

AAGTTCTGAT

TGGGACTTAC

CGAACTCTGG

GCAGTTTAAT

TTCTGAAACT

ATTTAACTTT

AAATGTCGAT

ATTTGTGGAC

AGCCCAGGAA

TCCTCAAAAG

TACAGATATA

GTCAGATATC

ATATATTCAT

CATAGAATTT

TGACACAAAA

ATTGATGCTT

GTAAGCCCCT

TACTGGGTAT

TTTGAAATTC

CCACTACATA

GATCGAAALAT

AAAA CTG CAT

TTGATTAATA

GATTTACAGG

CAGAAGAGAT

GGAATTAAAC

TCCGTTTATA

GCTCTCACCT

CCTGACGCTA

GCAAATAAAA

ATCAGGGTTG

CCAGAAAGAG

ATTAATATTG

TATTTGAAAG

AAAGATAAGG

ACGGATAATG

GACGTTAGCG

ATATAAGATG AAGTAAGTGC 120 AAATGTTCTC 180 GGGTTATTTT 240 TCATAATGGT 300 CACAGGATAA 360 TTTGGTGACC 420 TGTAACTACT 480 CATAGAAAAA 540 AGGCGGCAAA 600 CTCATTACAA 660 GTTGGGAGAT 720 TACGCCGAAG 780 TAGGTCGTGG 840 TGATCCAGGA 900 ATTTACGATG 960 ACATTTTGAA 1020 ATGAACTGTA 1080 CCAACGTAAT 1140 ACTTTACAGT 1200 AGGATGTCAA 1260 ATTTAACACT 1320 CTACTCAAAC 1380 AGGACGGTAA 1440 AGGATATGGA 1500 CAGCGGTGAA ATATCCGCAG ATGAGCTTGA GGAAGAAGAA GACGAAGAAG AAGACGAAGA 1560 42 WO 95/16694 CGAAGAAGAG AAAGAAGCTA GGCATACAAA TTCACCAAGG AAAAGAGGCC CTIUS94/14563 ACTAGGTAAA GATGATATTG ACCTAAAGAT CCTAGTAAAC TACTCCTGTG ATTAGGAAAT CCCGTTTTCG AAAAGATTTA ATTTTACGGA AATTCTTCGG CCAAAAGCAT CAGATTGTAG GTATGTCAAA GAAGAAATAT GAATGAATTC GCCTCAP.TTT TACTATATAC GTGGCTGGTA AAAGGAACTA CTATCGTCTT AAATGGATTG AAAATGGTAA AGGAGAAAGG TTAACATGGG TCCAAAAAAT AAGAAGAAAA GAAATCTCAA GATATTATGT TATTGTCATT GCAGTAGCCA TACTTCAAGA ATTGGGTTTA AAATATCATT GATTTAA GAC AACTGGCAAT GCTATTTTGA GCCTGAAGAC GTGAGGAAAG GAGAAAAGTA GCAAGTGTTA AGCTGAAATT GCTGAAAAAC GGTTATTGAA GATGAAAATG AAGTAACAAA GCCAAAGACG TCACATCACG CACGTTATGA TATGACGCGA CTTTCATTTA AAAGAACGGA TCTCAAGAGC TGAAGTAAAT GGCAGTAATA AAGTGATAAT ATTTCTGAAC ACTTGACGCG GGAATATTGT GCTAAATATA TCAGTAGAAG TTTATAGATT CGGTTTTTAT AGCGCATTTA TCCAAAACAT

ACGCTTCTGT

'J3CGTCAGAT

TTACAAAAAA

AATCTATAGC

AATTGATGGC

AAACAATTTT

TGAAGTCTGC

ATTTAAGTGC

CGCCTGGTGT

CTGCACAACG

AACCCACAGA

CAGCTTCAAT

CCATTGTAGT

ACAATTTTTT

ATACAATGGA

CCAGAATTAT

TGAAGGGGTT

TTGATGCGGC

TTCGCTTAAG

GTGGTGATGC

ACTATATGGC

AGGAGCAAAT

ATAATGATGA

AAGCCTTAAA

CAGCAAA.ACT

AAGAACTGGG

AGTTTGTCAT

AATTGAGAAT

TTAAACAAAC

AAGCCAAAAG

TATTCATGAC

ACGATATTGT

ACAACCTCCC

GCTATTGATA

GAATGTTGCT

TGCAATACCA

ATCAAGGTTT

TTCTAAAGTC

AAATTTCCAA

ATATAGTGCC

AGGGAAAACT

AGAAATACCA

CTGTTACGAA

GGAGTCACTA

CTTGTTGGAC

CAATTGGACT

CTTACCAGAA

GTTCACTGGG

GAACGACTCA

TGG.,A, "'GAC

AATGACGTGCT

AAGAAGAGCA

TAAGCATGGT

ATACGATGAT

CGATGATGAC

CGAAACTTTA

GTTTATTTAT

CGATATTGTC

GGAGATAGCC

TATATCATGG

TATGAAGAAC

AGCCATGAAT

CTAGCATACA

GGATGTACAT

CCCAAAAAAA

TCTTCATGCC

AGGGCGAAAA

GATTTAACTT

GAAAACAAAT

AAAAAACAGT

GATTATTTAC

ATTGAGTCCG

TTAACCGTAA

GACTTTCTTT

ACTTTATGGA

GAGTTTTACT

GAACTCGATG

ACTTACGAAA

CGTCAGCTAG

TATACGCACG

TTTTTCTATG

ACTACAGTTA

GATGCCATTG

TTGAAGGTTT

TATGGATATG

GAAGACAAGG

AATGATGGGG

AATTCTCATG

GCATTATTAA

GATGAAATCA

AAAACATTGT

GATTTCGTTC

GATAGAATAT

GAGGATGAGA

CATACATATA

ACCTTCTATA

GTAAGATAAA 1620 GAGGTCGTAA 1680 GTGCAAATAA 1740 AGAAATATAC 1800 CATTACCTGA 1860 TAAAAACAAC 1920 TGAACTCTTC 1980 CGGCTAGGGA 2040 ACTCCGCTAC 2100 GGGAAGTCGT 2160 ATGTGGAAAT 2220 ACAAAGTGTC 2280 TTAAAAGAGT 2340 CCATGGTAAC 2400 ATGCCAAACT 2460 GCAATAAGAT 2520 AAGAGCTAALA 2580 TTGATACAAA 2640 AGCAAACGTT .2700 'AAATAGCTTC 2760 GTAAAAGAGC 2820 ATGGAAAGAC 2880 ATCTTATTGA 2940 TACAAACAGT 3000 TAATTACGTT 3060 ACTTGATGAA 3120 AGTTACTTAT 3180 TCCAACAGGG 3240 TCAATCAGTT 3300 GTTGTGTCAA 3360 CATTGAGAAA 3420 CCTACATAGT 3480 TCTCCTTAAA 3540 GCTATTGTGT AGCTTGATTT AAAATATGCT AACGCCAACT CTCACATGGT AGCAGGCGGG 3600 WO 95/16694 WO 9516694PCr/US94/14563

TATAGTTGTT

GGCTTCCTGA

CCACCTAGGT

AATGTCAAAT

TTGGCAGTAG

CTCGAGATTT

ACCAAAATAG

GAACTCAACG

GTCGCAATTG

ATAATATCCG

GTGTATTCCT

TGTGCATCCC

TTGTTTGCTC

AAATCCAAAC

TCTATAATCC

ACCTTTTTTG

CCATACACCA

AGAATTATTC

AGAGTGCTAC

GGTGTGAAGC

GTGGTATTTT

TGAGGAAACC

TTCATGTATT

TATTATGGCT

GCTTATATAT

TAAAGATCTT

AGCAGAATAT

GTTCCTGATA

GAATTGCCGA

TCTCTGGACG

GGAGCATGTT

TGGAGCGTAT

TATCGTATAT

CTACTAATCC

CCGTGCTTGG

GTAATATCTA

AATTTATATT

CTTTTTCATA

TACACCATAC

CTGCAGGAGC

TGAGTAAATG

CGCCTCGGCC

GTTTCACAAT

GATATATGTG

AAGCCAAAAA

AACGCCCGGC

CTTTCTATCC

CAAAAGAGGA

TGCCAGTGCA

AAGAGGAGCA

TTCCGGGTCT

ATCATTTAGC

TGTCAGGTCA

GATGAATTCT

GCTTACTTTT

GTTCTGTACG

AACCTGAACT

TGTAGCCATC

TATTTTTCTC

ATTTTTTCCC

AAATAATTTC

ACCATAGCAC

TCCACTGAAA

GGAGGACGCG

GGCTGGACTC

ACCAGTGTCC

AGTGGTAGCA

GATGGTGCCT

TGACTTTTGT

TCGCCGATTT

ATTTTGAAAA

TTCATGACCT

AAGTCTATTA

CCGTACGCCT

AACAG, AiATA

CTTTGTCCGC

CTTTTCAAAA

TACTTCACCA

TCAACCTGAT

TTAGCTTAAC

GAAAAACTGA

TTCTGGGTTC

TCTAGATTTG

CAGTACACTA

AAAAAAGAGC

GTAGATCCAG

TCCAGGCCGG

TCATTAGTGA

GATTTGAACT

TAGATGAGGG

TATGATGTCG

CATTGATTTC

TTTTTTGAAT

GTGCATACTT

GTAAATCGTA

GCCTATACCA

TGATCACTGA

CTAAATCCAT

AGTTCACTCC

TCAGCGATGT

TTCGTACCGC

TCAATTTAAT

GGACAAGAGC

TTTTCTTCCT

AAGACAGCAT

TATTTTTATG

AGCATGGATG

TGTGGAATCA

AGTGATGATT

GTTCCAATGT

TAGTTAGTTG

CGACGAGGAG 3660 ATGTTGCTGG 3720 TGGGA1 ,,TT 3780 GTTTATAGAT 3840 CATACTCGTT 3900 CTTTGTGCCC 3960 CTCCTTTATT 4020 AGACCCTACC 4080 TATAGAAAAT 4140 CAGCGTCTGT 4200 TTTCCCTACT 4260 AGGTATAGAA 4320 TTCTACAGCA 4380 CAATCAATCA 4440 TTTCTTGTTT 4500 TTTTGTACAT 4560 AATTTTACTA 4620 TCATGTCGGT 4680 AGGTGGTGCC 4740 GCCACGCTGA 4800.

ATAGTTAGTA 4860 TATTCGCCTT 4920 4940 INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 914 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Ala Lys Thr Leu Lys Asp Leu Gln Gly Trp Glu Ile Ile Thr Thr 1 5 10 Asp Glu Gln Gly Asn Ile Ile Asp Gly Gly Gln Lys Arg Leu Arg Arg 20 25 Arg Gly Ala Lys Thr Glu His Tyr Leu Lys Arg Ser Ser Asp Gly Ile 40 WO 95/16694 PCT/US94/14563 Lys Leu Gly Arg Gly Asp Ser Val Val Met His Asn Glu Ala Ala Gly 55 Thr Tyr Ser Val Tyr Met Ile Gln Glu Leu Arg Leu Asn Thr Leu Asn 65 70 75 Asn Val Val Glu Leu Trp Ala Leu Thr Tyr Leu Arg Trp Phe Glu Val 90 Asn Pro Leu Ala His Tyr Arg Gln Phe Asn Pro Asp Ala Asn Ile Leu 100 105 110 Asn Arg Pro Leu Asn Tyr Tyr Asn Lys Leu Phe Ser Glu Thr Ala Asn 115 120 125 I Lys Asn Glu Leu Tyr Leu Thr Ala Glu Leu Ala Glu Leu Gin Leu Phe 130 135 140 145 150 155 160 Leu Lys Gly Asn Val Asp Pro Glu Arg Asp Phe Thr Val Arg Tyr Ile 165 170 175 Cys Glu Pro Thr Gly Glu Lys Phe Val Asp Ile Asn Ile Glu Asp Val 180 185 190 Lys Ala Tyr Ile Lys Lys Val Glu Pro Arg Glu Ala Gin Glu Tyr Leu 195 200 205 Lys Asp Leu Thr Leu Pro Ser Lys Lys Lys Glu Ile Lys Arg Gly Pro 210 215 220 J Gin Lys Lys Asp Lys Ala Thr Gin Thr Ala Gin Ile Ser Asp Ala Glu 225 230 235 240 Thr Arg Ala Thr Asp Ile Thr Asp Asn Glu Asp Gly Asn Glu Asp Glu 245 250 255 Ser Ser Asp Tyr Glu Ser Pro Ser Asp Ile Asp Val Ser Glu Asp Met 260 265 270 Asp Ser Gly Glu Ile Ser Ala Asp Glu Leu Glu Glu Glu Glu Asp Glu 275 280 285 Glu Glu Asp Glu Asp Glu Glu Glu Lys Glu Ala Arg His Thr Asn Ser 290 295 300 Pro Arg Lys Arg Gly Arg Lys Ile Lys Leu Gly Lys Asp Asp Ile Asp 305 310 315 320 Ala Ser Val Gin Pro Pro Pro Lys Lys Arg Gly Arg Lys Pro Lys Asp 325 330 335 Pro Ser Lys Pro Arg Gin Met Leu Leu Ile Ser Ser Cys Arg Ala Asn 340 345 350 Asn Thr Pro Val Ile Arg Lys Phe Thr Lys Lys Asn Val Ala Arg Ala 355 360 365 Lys Lys Lys Tyr Thr Pro Phe Ser Lys Arg Phe Lys Ser Ile Ala Ala 370 375 380 Ile Pro Asp Leu Thr Ser Leu Pro Glu Phe Tyr Gly Asn Ser Ser Glu 385 390 395 400 Leu Met Ala Ser Arg Phe Glu Asn Lys Leu Lys Thr Thr Gin Lys His 405 410 415 WO 95/16694 PCT[US94/14563 Gin Ile Val Giu Thr Ile Phe Ser Lys Val Lys Lys Gin Leu Asn Ser 420 425 430 Ser Tyr Val Lys Giu Giu Ile Leu Lys Ser Ala Asn Phe Gin Asp Tyr 435 440 445 Leu Pro Ala Arg Glu Asn Glu Phe Ala Ser Ile Tyr Leu Ser Ala Tyr 450 455 460 Ser Ala Ile Giu Ser Asp Ser Ala Thr Thr Ile Tyr Val Ala Gly Thr 465 470 475 480 Pro Gly Val Gly Lys Thr Leu Thr Val Arg Giu Val Val Lys Glu Leu 485 490 495 Leu Ser Ser Ser Ala Gin Arg Glu Ile Pro Asp Phe Leu Tyr Val Giu 500 505 510 Ile Asn Giy Leu Lys Met Val Lys Pro Thr Asp Cys Tyr Giu Thr Leu 515 520 525 Trp Asn Lys Val Ser Gly Glu Arg Leu Thr Trp Ala Ala Ser Met Giu 530 535 540 Ser Leu Glu Phe Tyr Phe Lys Arg Val Pro Lys Asn Lys Lys Lys Thr 545 550 555 560 Ile Val Vai Leu Leu Asp Giu Leu Asp Ala Met Val Thr Lys Ser Gin 565 570 575 Asp Ile Met Tyr Asn Phe Phe Asn Trp Thr Thr Tyr Glu Asn Ala Lys 580 585 590 Leu Ile Val Ile Ala Val Ala Asn Thr Met Asp Leu Pro Glu Arg Gin 595 600 605 Leu Gly Asn Lys Ile Thr Ser Arg Ile Gly Phe Thr Arg Ile Met Phe 610 615 620 Thr Gly Tyr Thr His Glu Giu Leu Lys Asn Ile Ile Asp Leu Arg Leu 625 630 -635 640 Lys Gly Leu Asn Asp Ser Phe Phe Tyr Val Asp Thr Lys Thr Gly Asn 645 650 655 Ala Ile Leu Ile Asp Ala Ala Gly Asn Asp Thr Thr Val Lys Gin Thr 660 665 670 Leu Pro Giu Asp Val Arg Lys Val Arg Leu Arg Met Ser A).a Asp Ala 675 680 685 Ile Giu Ile Ala Ser Arg Lys Val Ala Ser Val Ser Gly Asp Ala Arg 690 695 700 Arg Ala Leu Lys Val Cys Lys Arg Ala Ala Glu Ile Ala Glu Lys His 705 710 715 720 Tyr Met Ala Lys His Gly Tyr Gly Tyr Asp Gly Lys Thr Val Ile Giu .7-95 730 735 Asp Giu Asn Giu Giu Gin Ile Tyr Asp Asp Glu Asp Lys Asp Leu Ile 740 745 750 Giu Ser Asn Lys Ala Lys Asp Asp Asn Asp Asp Asp Asp Asp Asn Asp 755 760 765 Gly Val Gin Thr Val His Ile Thr His Val Met Lys Ala Leu Asn Giu 770 775 780 46 .WO095/16694 PCTIUS94/14563 Arg Leu Ser Phe TI Thr 785 Leu Asn Ser His Val 790 Tyr Ile Thr Phe Met Thr 795 Leu Ala Lys Leu Phe Ile 805 Ala Leu Leu Asn 810 Met Lys Lys Ser Gin Giu Ile~ Giu Vai 835 Leu Phe Gin Gin 820 Asn Giu Leu Gly Asp Vai Asp Giu Ile Gly Ser Asn Vai Met Giu Lys Leu Leu 830 Aia Lys Thr Arg Ile Ile Gin Gly Ser 850 Ser Trp 865 Asp 855 Asn Ile Ser Giu Gin 860 Ala Asp Phe Vai Leu 870 Gin Leu Leu Gly Ile Leu Lys Gin Thr Met Asn Asp Arg Ile Vai Lys Leu Asn Ile 895 Leu Arg Ser Val Giu Aia Lys Arg Ala Giu Asp Giu Asn Leu INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 2809 base pairs TYPE: nucieic acid STRANDEDNESS: double TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 807. .2666 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: GAGCTCAACA CCACCATTGA ATGAACTCTC CTAGCAATGT

CATACTTGGC

TTATATCAAG

AACCGGCCAG

AGCTTTTTGG

ATTTCTCCAG

TTAACTTGCT

ATGGATCAGC

ACTCACCGCA

GTGTAAAAGT

GCTGAAATTG

CAAAAATTCA

AGTCTGTTTC

GAAATTAGTC

CCTCAATTTG

TGGCCTCATC

GTGGCACCTT

AAAGGCCAAA

TAGTACCAAT

TCATTGAGGT

TATAAAGAAA

TATTTGATAA

GAACGTAGAA

GAAACTTCTC

GGATTGAATA

TTAATTGGAT

TTTTCGAAGC

TATTCCCTTA

TAGGTGTAAA

TATATGTAAT

TATCCATGCA

TTCTAGTCTA

GGAAATGACA

TAGGAACCAC

TTGATCATTG

TTTCAATTTT

TTAAGGGAAA

TATATAATCG

TTGCTGTGAT

TGGTTTTGGT

ATACGCTTCT

CTAGCAATAG

ATGAACCATC

TAAATATCGA

GCAAAATCGG

ATAGTAAGCA

CTTTAAATTA

ATCTTATTTG

TAAGCTGATT

TTTTCGCCTT

GAACTTGTAT

CTAGTATTGA

TTCGCAAGAG

TCAACTCTGT

CGTCACTAGC

TTTCAATGGA

CTTATTCGCG

GAAAATTTTT

GAATTGTTAT

AGACAAAGTA

CTATATCTTT

CTCTTTCTGC

TTTGAATGGG

GGATAAAAAT

GATGACTATA

TCTTTTTGAC

CTTAGAGACC

TGCCGTGACA

TCATAAGAAT

TAAATGTGAT

CAGAACACCC

TCTTCACAAT

GAATATATTA

AAAACAAGTT

WO095/16694 PCTJUS94/14563 TTTGTAGTAC TGCGAATTGC CATAAC ATG CTA AAT GGG GAA GAC TTT GTA GAG 833 Met Leu Aen Gly Glu Asp Phe Val Glu 1 CAT AAT His Asn AAA AGG 0O Lys Arg TCA AAG Ser Lys AAA AAT Lys Aen GCT CCC Ala Pro GAT AGG Asp Arg AGG AAA Arg Lys ACT TCT Thr Ser AGA GAA Arg Glu GTG ACT Val Thr 155 CCA GAA Pro Glu 170 TTT ACT Phe Thr AAA GAC Lye Asp CCA ACC Pro Thr AAG TCA Lys Ser 235 AGA AAA

GAT

Asp

OTT

Val

AAG,

Lys

ACA

Thr

CGT

Arg

ATC

Ile

TTG

Leu

AAC

Asn

CGC

Arg 140

CCA

Pro

CCT

Pro

TCG

Ser

TCA

Ser

CCT

Pro 220

GCA

Ala

ATT

ATC

Ile

GAC

Asp

AAT

Asn 45

TCT

Ser

A

Lye

AAG

Lye

GAC

Asp

MAC

Aen 125

GMA

Glu

CAA

Gin'

OCA

Ala

CCC

Pro

ACC

Thr 205

GTA

Val

AOC

Ser

GTC

CTA

Leu

CCA

Pro

TTG

Leu

OCA

Pro

COT

Arg

MAG

Lys

A

Lye 110

MAG

Lye

A

Lys

ACT

Thr

ACA

Thr

CTA

Leu 190

TCC

Ser

CCG

Pro

TCG

Ser

AGA

TCG

Ser 15

CAT

His

TTG

Leu

OAT

Asp

GGA

Gly

OAT

Asp 95

GAT

Asp

CAG

Gin

ATA

Ile

GAT

Asp

CCA

Pro 175

MAG

Lys

CCA

Pro

A

Lye

TTT

Phe

ACT

TCT

Ser

GGA

Gly

GMA

Glu

CCG

Pro

AGA

Arg 80

GAG

Giu

ACA

Thr

GTG

Val

CAG

Gin

GAT

Asp 160

TCT

Ser

CMA

Gin

GGT

Gly

MAT

Asn

TTG

Leu 240

MAT

CCC OCA MAA AGC AGO MAT GTA ACC CCA Pro

GMA

Glu

AGA

Arg

GCA

Ala 65

CCA

Pro

AMA

Lys

TCA

Ser

ATO

Met

GTA

Val 145

MAT

Aen

MAG

Lys

ATT

Ile

A

Lye

A

Lye 225

OAT

Asp

CG

Ala

AGA

Arg

ATC

Ile 50

CTC

Leu

AGA

Arg

GAT

Asp

GGT

Gly

GMA

Clu 130

GC

Al a

TTT

Phe

MAG

Lye

ATA

Ile

TTA

Leu 210

MAC

Lye

ACT

Thr

MAG

Lye Lys

CMA

Gin 35

TCG

Ser

A

Lye

MAG

Lys

ACA

Thr

MAT

Aen 115

MAG

Lye

ACC

Thr

GTA

Val

TCT

Ser

ATG

Met 195

ACC

Thr

CTC

Leu

TTT

Phe

TCA

Ser Ser 20

CTG

Leu

CTT

Leu

CCT

Pro

ATA

Ile

ATT

Ile 100

GTC

Vai

ACO

Thr

ACA

Thr

TCA

Ser

TTA

Leu 180

MAT

Asn

TTG

Leu

TAC

Tyr

GMA

Giu

AGG

Arg 260 Arg

AGA

Arg

GTA

Val

A

Lys

CAG

Gin 85

TC

Ser

MAT

Aen

CG

Cly

ACA

Thr

MAT

Asn 165

ACC

Thr

MAT

Asn

AGT

Ser

CMA

Gin

GGA

Gly 245

CAC

His Asn

AGA

Arg

GGC

Gly

ACG

Thr 70

GMA

Giu

TCT

Ser

GAG

Giu

ATA

Ile

TAT

Tyr 150

TCA

Ser

ACT

Thr

TTA

Leu

AGA

Arg

ACT

Thr 230

TAT

Tyr

ACC

Thr Val

ATT

Ile

MOC

Asn 55

CCA

Pro

GMA

Giu

MAG

Lys

GMA

Ciu

MA

Lye 135

GMA

Glu

CCC

Pro

MAT

Asn

A

Lye

MAT

Asn 215

TCG

Ser

TTC

Phe

ATG

Met Thr

CAT

His 40

GAA

Giu

ACT

Ser

TTA

Leu

MA

Lys

AGC

Ser 120 C- A G Ciu

GAT

Asp

GAG

Giu

CAT

His

GMA

Giu 200

TTT

Phe

GMA

Glu

GAO

Asp

TCA

Ser

TCA

Ser

AGG

Arg

A

Lye

ACT

Thr

MAG

Lye 105

MAG

Lye

A

MAT

Asn

CCA

Pro

GAT

Asp 185 T \T

ACT

Thr

ACC

Thr

CAA

Gin

ATG

Met 265 881 929 977 1025 1073 1121 1169 1217 1265 1313 1361 1409 1457 1505 1553 1601 4i Aoj Lye Ile Val Arg Thr Asn Ala WO95/16694 PCTIUS94/14563 GCA CCT GAC GTT ACC tGA GAA GAG TTT TCC CTA GTA TCA AAC TTT TTC 1649 Ala Pro Asp Val Thr Arg Glu Glu Phe Ser Leu Val Ser Asn Phe Phe 270 275 280 AAC GAA AAT TTT CAA AAA CGT CCC AGG CAA AAG TTA TTT GAA ATT CAG 1697 Asn Glu Asn Phe Gin Lys Arg Pro Arg Gin Lys Leu Phe Glu Ile Gin 285 290 295 AAA AAA ATG TTT CCC CAG TAT TGG TTT GAA TTG ACT CAA GGA TTC TCC 1745 Lye Lys Met Phe Pro Gln Tyr Trp Phe Glu Leu Thr Gin Gly Phe Ser 300 305 310 TTA TTA TTT TAT GGT GTA GGT TCG AAA CGT AAT TTT TTG GAA GAG TTT 1793 Leu Leu Phe Tyr Gly Val Gly Ser Lys Arg Asn Phe Leu Glu Glu Phe 315 320 325 GCC ATT GAC TAC TTG TCT CCG AAA ATC GCG TAC TCG CAA CTG GOT TAT 1841 Ala Ile Asp Tyr Leu Ser Pro Lye Ile Ala Tyr Ser Gin Leu Ala Tyr 330 335 340 345 GAG AAT GAA TTA CAA CAA AAC AAA CCT GTA AAT TCC ATC CCA TGC CTT 1889 Glu Aen Glu Leu Gin Gin Asn Lys Pro Val Asn Ser Ile Pro Cys Leu 350 355 360 ATT TTA AAT GGT TAC AAC CCT AGC TGT AAC TAT CGT GAC GTC TTC AAA 1937 Ile Leu Asn GLy Tyr Asn Pro Ser Cys Asn Tyr Arg Asp Val Phe Lys 365 370 375 GAG ATT ACC GAT CTT TTG GTC CCC GCT GAG TTG ACA AGA AGC GAA ACT 1985 Glu Ile Thr Asp Leu Leu Val Pro Ala Glu Leu Thr Arg Ser Glu Thr 380 385 390 AAG TAC TGG GGC AAT CAT GTG ATT TTG CAG ATC CAA AAG ATG ATT GAT 2033 Lys Tyr Trp Gly Asn His Val Ile Leu Gln Ile Gin Lys Met Ile Asp 395 400 405 TTC TAC AAA AAT CAA CCT TTA GAT ATC .AAA TTA ATA CTT GTA GTG CAT 2081 Phe Tyr Lys Asn Gin Pro Leu Asp Ile Lys Leu Ile Leu Val Val His 410 415 420 425 AAT CTG GAT GGT CCT AGC ATA AGG AAA AAC ACT TTT CAG ACG ATG CTA 2129 Asn Leu Asp Gly Pro Ser Ile Arg Lys Asn Thr Phe Gin Thr Met Leu 430 435 440 AGC TTC CTC TCC GTC ATC AGA CAA ATC GCC ATA GTC GCC TCT ACA GAC 2177 Ser Phe Leu Ser Val Ile Arg Gin Ile Ala Ile Val Ala Ser Thr Asp 445 450 455 CAC ATT TAC GOT CCG CTC CTC TGG GAC AAC ATG AAG GCC CAA AAC TAC 2225 His Ile Tyr Ala Pro Leu Leu Trp Asp Asn Met Lys Ala Gin Asn Tyr 460 465 470 AAC TTT GTC TTT eAT GAT ATT TCG AAT TTT GAA CCG TCG ACA GTC GAG 2273 Asn Phe Val Phe His Asp Ile Ser Asn Phe Glu Pro Ser Thr Val Glu 475 480 485 TCT ACG TTC CAA GAT GTG ATG AAG ATG GGT AAA AGC GAT ACC AGC AGT 2321 Ser Thr Phe Gin Asp Val Met Lye Met Gly Lye Ser Asp Thr Ser Ser 490 495 500 505 GGT GCT GAA GGT GCG AAA TAC GTC TTA CAA TCA CTT ACT GTG AAC TCC 2369 Gly Ala Glu Gly Ala Lys Tyr Val Leu Gin Ser Leu Thr Val Asn Ser 510 515 520 AAG AAG ATG TAT AAG TTG CTT ATT GAA ACA CAA ATG CAG AAT ATG GGG 2417 Lys Lys Met Tyr Lys Leu Leu Ile Glu Thr Gin Met Gin Asn Met Gly 525 530 535 49 Vi WO 95116694 PCTIUS94/14563 AAT CTA TCC GCT AAC ACA GGT CCT AAG CGT GGT ACT CAA AGA ACT GGA 2465 Aen Leu Ser Ala Aen Thr Gly Pro Lye Arg Gly Thr Gin Arg Thr Gly 540 545 550 GTA GAA CTT AAA CTT TTC AAC CAT CTC TGT GCC GCT GAT TTT ATT GCT 2513 Val Glu Leu Lys Leu Phe Asn His Leu Cys Ala Ala Asp Phe Ile Ala 555 560 565 TCT AAT GAG ATA GCT CTA AGG TCG ATG CTT AGA GAA TTC ATA GAA CAT 2561 Ser Asn Glu Ile Ala Leu Arg Ser Met Leu Arg Glu Phe Ile Glu His 570 575 580 585 AAA ATG GCC AAC ATA ACT AAG AAC AAT TCT GGA ATG GAA ATT ATT TGG 2609 Lye Met Ala Asn Ile Thr Lye Aen Aen Ser Gly Met Glu Ile Ile Trp 590 595 600 GTA CCC TAC ACG TAT GCG GAA CTT GAA AAA CTT CTG AAA ACC GTT TTA 2657 Val Pro Tyr Thr Tyr Ala Glu Leu Glu Lys Leu Leu Lys Thr Val Leu 605 610 615 AAT ACT CTA TAAATGTATA CATATCACGA ACAATTGTAA TAGTACTAGG 2706 Asn Thr Leu 620 CTTGCTAGCT TTGCTTTCCC ATAACCAACA ATACTTAGTG ATGTATCTTA AAACGACTAA 2766 AAAACTTCTC ATATAACCCT ACTGAAAAAC GTCTGATGAG CTC 2809 30 INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 620 amino acids TYPE: amino acid TOPOLOGY: linear i

/I

(ii) MOLECULE TYPE: protein Met 1 Pro Glu Arg Ala Pro Lys Lys Ser Met (xi) SEQUENCE Leu Asn Gly Glu 5 Ala Lye Ser Arg 20 Arg Gin Leu Arg Ile Ser Leu Val Leu Lys Pro Lye Arg Lys Ile Gin Asp Thr Ile Ser 100 Gly Asn Val Asn 115 Glu Lys Thr Gly 130 DESCRIPTION: SEC: ID Asp Phe Val Glu His 3.10 Asn Val Thr Pro Lys Arg Ile His Ser Ser Gly Asn Glu Arg Lys Thr Pro Ser Lys Ala 70 Glu Glu Leu Thr Asp 90 Ser Lye Lys Lys Arg 105 Glu Glu Ser Lys Thr 120 Ile Lye Glu Lys Arg 135 NO: 4: Asn A Arg V Lys L Asn T Pro A Arg I Lye L Ser A Glu A Ser His Lau Asp Gly Asp Asp Gin Ile WO 95/16694 WO 9516694PCT1US94/14563 Val 145 Asn Lys Ilie Lys Lys 225 Asp Ala Giu Pro Trp 305 Ser Lys Lye Ser Pro 385 Ile Asp Arg Gin Trp 465 Ser Ala Phe Lys Ile Leu 210 Lye Thr Lys Phe Arg 290 Phe Lye Ile Pro Cys 370 Ala Leu Ile Lye Ile 450 Asp Asn Thr Thr Thr Tyr 15&, Giu Asp Asn Val Thr Pro Gin Thr Asp Val Ser Met 195 Thr Leu Phe Ser Ser 275 Gin G iu Arg Ala Val 355 Asn Glu Gin Lye Aen 435 Ala Asn Phe Ser Leu 180 Asn Leu Tyr Giu Arg 260 Leu Lys Leu Asn Tyr 340 Asn Tyr Leu Ile Leu 420 Thr Ile Met Giu Lye 500 Asn 165 Thr Aen Ser Gin Gly 245 His Val Leu Thr Phe 325 Ser Ser Arg Thr Gin 405 Ile Phe Val Lys Pro 485 Ser Thr Leu Arg Thr 230 Tyr Thr Ser Gin 310 Leu Gin Ile Asp Arg 390 Lye Leu Gin Ala Ala 470 Ser Pro Aen Lye Aen 215 Ser Phe Met Asn Giu 295 G ly G iu Leu Pro Val 375 Ser Met Val Thr Ser 455 Gin Thr Giu His Giu 200 Phe Giu Asp Ser Phe 280 Ile Phe G iu Al a Cys 360 Phe Giu Ile Vai Met 440 Thr Asn Val Pro Asp 185 Tyr Thr Thr Gin Met 265 Phe Gin Ser Phe Tyr 345 Leu Lys Thr Asp His 425 Leu Asp Tyr Giu Ser 505 Pro 170 Phe Lye Pro Lye Arg 250 Ala Asn Lys Leu Ala 330 Giu Ile Giu Lye Phe 410 Asn Ser His Asn Ser 490 Giy Giu Pro Thr Ser Asp Ser Thr Pro 220 Ser Ala 235 Lye Ile Pro Asp Giu Asn Lye Met 300 Leu Phe 315 Ile Asp Asn Giu Leu Aen Ile Thr 380 Tyr Trp 395 Tyr Lye Leu Asp Phe Leu Ile Tyr 460 Phe Val 475 Thr Phe Ala Glu Ala Pro Thr 205 Val Ser Val Val Phe 285 Phe Tyr Tyr Leu Gl~y 365 Asp Gly Asn Gly Ser 445 Ala Phe Gin Gly Thr Leu 190 Ser Pro Ser Arg Thr 270 Gin Pro Gly Leu Gin 350 Tyr Leu Asn Gin Pro 430 Val1 Pro His Asp Ala 510 Pro 175 Lye Pro Lye Phe Thr 255 Arg Lye Gin Val Ser 335 Gin Asn Leu His Pro 415 Ser Ile Leu Asp Val 495 Lye Asp 160 Ser Gin Gly Asn Leu 240 Asn Giu Arg Tyr Gly 320 Pro Asn Pro Val Vai 400 Leu Ile Arg Leu Ile 480 Met Tyr

I

Lys Met Gly Ser Asp Thr Ser 2 WO 95/16694 Val Leu Gin 515 Ile Glu Thr 530 Ser Leu Thr Gin Met Gln Val Asn 520 Asn Met 535 Ser Lys Lys Met PCT/US94/14563 Lye Leu Leu Asn Thr Gly Giy Asn Leu Pro 545 His Lys Arg Giy Thr Gin 550 Asp Arg Thr Giy Vai Glu 555 Asn Lys Leu Phe Leu Cys Ala Al.a 565 Glu Phe Ile Ala Giu Ile Ala Leu Arg 575 Thr Lys Ser Met Aen Asn Leu Phe Ile Giu Lye Met Ala Asn Ser Gly Met Giu 595 Lye Leu Leu Lys Ile Ile 600 Thr Val 615 Trp Vai Pro Tyr Tyr Aia Giu Leu Giu 610 Leu Aen Thr INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 2759 base pairs TYPE: nucleic acid STRA~NDEDNESS: double TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID

TCTGAAATAA

TTCCAGCATG

TP iAAGTCAG

AAGAGTTCCT

ACGCTATAAT

AAAAATAAAA

GAAGGGTTGT

CATTACCTGC

TGATAATGGA

TAACGCCTTT

AAGAAAAAGL

AGTAGTCCAA

CTGACGCCCA

ATAAACACAT

AAAGATGGGA

TCGATAATAT

CTCAGAAAAG

AAAGTACAAA

TCTTTGCGCA

TTCTTTAGTA

GAAGTTTGTC

TTGTGTCATA

AAAGGTGGAA

CGTTGGTTGC

TGTTTTATTA

ACTAAGCATT

TTTAGTGTCT

AAAGTGACAA

CAGGATGAGC

AAGGAGCCAC

TCCCTTTGTC

ATTGTATCAT

TGAAGCAGAC

GCGATGCTTT

AAAGAAAACA

GATCCAAATC

GCATTCATCT

TACTGTGAAT

AAGAATTTTT

AAGACAATCT

GTTACGAGAC

TCTTTATATT

CTGTTAGATG

TTTTGATATT

AAGATGGCAT

GACCTTAACC

TATACAGTAT

AAACTTCTAT

CAGTTACATT

TTGAAAGCAG

AACACTATTT

ATATACCAGA

TTTCTTTGTC

TCTTGGTA.AG

ATACTTTGCA

TGTAGAATAG

TTTCCAGAAA

AGGCTTGACA

TAGTAAGACC

GTAAGAATTT

T'eTGACGTA

TGTTTACATA

AATCCAAAAA

ACCCCAGTTT

CAGGCAAAGA

CCCACTTTCA

AGATTTCAGA

TCCTATTAGG

TATGAACCCT

TTGAAATTTA

TCTTTTTCTT

CATTTGTTTA

CTTTTTTTTT

CTTGAAACTA

ATTTCACAAG

AGCAGAAACG

TTTTTACCTT

TTTTTCCGCA

CAGAGTCGTA

GATGAACGTC

GCCTCAAAGT

ATCGGAAGTG

TGATCAAGTA

CCTTTTATAT

TTCAGATAGT

TTTAGTGAGA

TTCAGTAAAT

GTTTTTGAAA

ATTTTTAAAC

AATAGGAAAA

TACTGGAGAT

AGTAATGTTT

CTACACGTGA

CCATTACCAC

CCGTAATTTG

GTATCACAAG

AGCGAGTTTG

AACAAAAATG

AACGTGGAAA

GATCATATTA

AGTGAAACTA

ACGACAAAAA

120 180 240 300 360 420 480 540 6C00 660 720 780 840 900 960 1020 I1

'V

WO 95/16694 PCT1US94/14563 TTGAACTTAA AGACGAATCT CTCCGAATGT AAGAATGATG CAGAAGAACA TCCAACTATC AGCAAAACAA TGATGTATCA GAAAAGACTT AGCAATGGTA TGGATAACTT CATAATTCTA TAATCTTTAA TATTAATACA TACGACTTCT GAAGAGAAAT ACGGAAACCA AATCTTTCAA ATCGTTTTGT GGAATTCATT TATTGACGAA GATGCTGGAT TATTCATTGA CCCTGTAAAT GATGTCCT1,C ATTCATGTTC AAATTCTTTC ATTATTGACA TGGTAAGAGA GAACCCGATT AATTGAATAT AACCAATTTT TAGACTCCTA TCTAGATCGT AACCCATTGA TACAATTTTT CCCAGTCGAT TTTCCCTTCT ACGTGCTGCC TTCGCTTGAT TAATGGCTCC GGTACTGGGT ACATTTACGA TTTCTACATT TCATAAGAAA AGATCCCTCC TTGACAAAGT AGCACTAATT TCATTAPAGTT TCAAAGCACC TTTAGATAAA GAATGCACGG AGAACCACGT TTTTGTAATG TTATTATTAT TATTATCGAA GCCATGATGC GCGAAGATTG

TCTCGCTACA

CTTCGTAGGT

AAGTATGAAG

TACGATCTGT

TTTAATTTTA

TTGAAAAGTG

AACTTGTCAA

TATCATAAAC

AGCTTTTTGG

CTCAGCAAGA

TATTCGTTGA

GTTGATTTTT

TTTGTCGAAG

AACAAAAACA

AACGGGCATG

AATCTGATAG

TGGTCAGCAT

CAAGAGCTAT

TACAAGTCAA

AAAGAAAATT

CAGCTATTCA

GCGTTCAGAG

AACACCAAAC

TTATTCATGC

AAGAGTTACG

ATAAATAAGT

AACAGTCTAC

,TGGAGGGTAA

GCAATGGGAA

ACGTTTTGAT

CTATGTACAA

ACATTAACGA

CACTTGTGGA

AAGATGTAGA

CCTTCAAGTA

ATATTGAGAA

TAGACGTGTC

ATACGGTTGA

TGGCAAATAA

TGTCGTACTT

TGAACGACGA

GTCTTATAAA

GAGGCCTAGA

CTAAGTTTGT

AATTATATCA

GTAAAGAGTA

TTACTTTGGA

ATATCGAAGA

ATGATACTCT

AGCTTTATCG

AAACATTACC

TCTTAGAACT

AAGCAATCTI

ATCTGGTAGP

AAATAAATAP,

CTGTATCTCP

TATTATGTAI

ACTCAAGAACG

TGAATTGACT

ACTTTACAGC

TGAAGATGGC

AAACTTCAAA

TTCTATTAAC

TGACCATGTT

AAATTTGAGA

GAGTAATAAA

TGGCAAACTA

TACTAATCAC

TTTCCAGAAT

CTACTTAAAA

GCAGCATGCT

AGAGTTTTTT

TGCTCGATTC

TAATTTGCTT

TAAGGATCGG

CAACAGAAGT

TAACTTACTA

TTCTGGAGAT

TGAGGCGAAT

AAAAGAGGAA

AGCAGAAACA

CGCCTTTGAA

AAAATGTGTC

CCATACATAT

LTCATTTTTCT

AGGTAAAATI,

GCAGCAACAA

CCGAAAGAAT 1080 GCAGCTGATG 1140 GATTTTACCG 1200 AGGCTTTTTG 1260 TTCAACACAT 1320 AAAATAAGTT 1380 CAATCAACCA 1440 GGATTTAAGT 1500 AATCTTTCAG 1560 AACTTACAAT 1620 GCCTTTTCAG 1680 ATACTGAGCA 1740 CCTGCTGACG 1800 GTTGAGTTTT 1860 CTCGAAGAAG 1920 ATTGGCAAAC 1980 CTTCATTTTG 2040 GGATTACTTA 2100 AGTTGGGAGC 2160 TTGGATAAAA 2220 ATGACTATCA 2280 ATATTAAATT 2340 CCGGACGCAT 2400 AACATGGGTC 2460 TGGAGAGGAA 2520 ATAGAACCAT 2580 GTGTTAACTA 2640 AATAGATAGT 2700 AAAAATAAA 2 7 59 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTIC~z LENGTH: 615 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide WO 95/16694 PCTIUJS94/14563 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Met Ser Asp Leu Asn Gin Sar Lys Lys Met Asn Val Ser Giu Phe Ala 1 5 10 Asp Ala Gin Arg Ser His Tyr Thr Val Tyr Pro Ser Leu Pro Gin Ser 25 Asn Lys Asn Asp Lys His Ile Pro Phe Val Lys Leu Leu Ser Gly Lys 35 40 Giu Ser Giu Val Asn Vai Giu Lys Arg Trp Giu Leu Tyr His Gin Leu 55 His Ser His Phe His Asp Gin Vai Asp His Ile Ile Asp Asn Ile Giu 70 75 Ala Asp Leu Lys Ala Giu Ile Ser Asp Leu Leu Tyr Ser Giu Thr Thr 90 Gin Lys Arg Arg Cys Phe Asn Thr Ile Phe Leu Leu Giy Ser Asp Ser 100 105 110 Thr Thr Lys Ile Giu Leu Lys Asp Giu Ser Ser Arg Tyr Asn Vai Leu 115 120 125 Ile Giu Leu Thr Pro Lys Giu Ser Pro Asn Val Arg Met Met Leu Arg 130 135 140 Arg Ser Met Tyr Lys Leu Tyr Ser Ala Ala Asp Ala Giu Giu His Pro 145 150 155 160 Thr Ile Lys Tyr Giu Asp Ile Asn Asp Giu Asp Gly Asp Phe Thr Giu 35165 170 175- Gin Asn Agn Asp Val Ser Tyr Asp Leu Ser Leu Val Giu Asn Phe Lys 130 185 190 Arg Leu Phe Giy Lys Asp Leu Ala Met Val Phe Asn Phe Lyo Asp Val 195 200 205 Asp Ser Ile Asn Phe Asn Thr Leu Asp Asn Ph Ile Ile Leu Leu Lys 210 215 220 Ser Ala Phe Lys Tyr Asp His Val Lys Ile Ser Leu Ile Phe Asn Ile 225 230 235 240 Asn Thk Asn Leu Ser Asn Ile Giu Lys Asn Leu Arg Gin Ser Thr Ile 245 250 255 Arg Leu Leu Lys Arg Asn Tyr His Lys Leu Asp Val Ser Ser Asn Lys 260 265 270 Gly Phe Lys Tyr Gly Asn Gin Ile Phe Gin Ser Phe Leu Asp Thr Val 275 280 285 Asp Gly Lys Leu Asn Leu Ser Asp Arg Phe Val Giu Phe Ile Leu Ser 290 295 300 Lys Met Aia Asn Asn Thr Asn HiJs Asn Leu Gin Leu Leu Thr Lys Met 305 310 315 320 Leu Asp Tyr Ser Leu Met Ser Tyr Phe Phe Gin Asn Aia Phe Ser Vai 325 330 335 Phe Ile Asp Pro Vai Asn Vai Asp Phe Leu Asn Asp Asp Tyr Leu Lys 340 345 350 WO 95/16694 PCTIUS94/14563 I I 14 7' 9 Ile Lys Asn 385 Pro Leu Ile Tyr Leu 465 Pro Val Leu Arg Arg 545 Pro Asp Asn Glu Gin His 370 Arg Giy Ile Asn Asn Ile Gly Lys 435 Lys Asp 450 Phe Thr Ser Tyr Leu Pro Asp Lys 515 Giu Ala 530 Glu Thr Ser Asn Lys Val Met Gly 595 Lys Cys 610 Ala Leu Gly Thr 420 Leu Arg Leu Lys Ser 500 Ile Asn Leu Thr Ala 580 Leu Val1 Pro Ala Giu Giu 390 His Ala 405 Asn Phe Asp Ser Leu His Asp Asn 470 Ser Asn 485 Leu Asp Met Ala met Thr Pro Lys 550 Lys Leu 563 Leu Ile Ile Lys Trp Arg Asp 375 Phe Lys Asn Tyr Phe 455 Arg Ile Lys Pro Ile 535 Glu Leu Leu Phe Gly 615 Giu Ile Leu Phe Val Giu Phe Val Ala 410 Leu Ile Giu 425 Leu Asp Arg 440 Giu Pro Ile Ser Gly Leu Giu Asp Asn 490 Glu Asn Tyr 505 Vai Leu Gly 520 Asn Ile Tyr Glu Ile Leu Giu'Leu Ala 570 Phe Met Gin 585 Gin Ser Thr 600 Leu Ser Arg Cys 355 Pro Thr Phe Met Phe Phe Val 360 Giu 365 Leu Vai Leu His Ala 445 Ile Gin Ser Leu Phe 525 Tyr I le Pro Phe Tyr 605 Gly Thr Arg Glu Asn 430 Cys Phe Ser Trp Ser 510 Lys Ile Arg Asp Ala 590 Asp Leu Asn Glu Giu 415 Leu Lys Gin Ile Giu 495 Gly Leu Ala Lys Ala 575 Phe Leu Ile Lys Asn 400 Glu Leu Glu Glu Phe 480 Gin Asp Tyr Phe Asp 560 Phe G lu Val INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 2404 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: CTCGAGGCCA CCAAGAAGAG AAAGAGAAGA GCCAGATATT GACTGGAGTG CAGCCAGAGG TTCCAACTTC CAAAGCTCCT CGGAGCCACC AAGAAGAGAA AGAGAAAAGG AAGAACCAGC 120 WO 95/16694 WO 9516694PCTIUS94/14563 TTTGGATTGG GGTGCTGCCA CTACAAGGAT AGGTCTCTAA GTCTGTTTAT GATGTTTTAC AAATGGAGAC GCAAAAGAAA TGCTCAATTG ACTGTTGAAG TGTATGATGA TAAAATGTAC TACTCTCCTT TCTACCAGGT TATTTTGTAT TAAGTTTCAT TGTGAGGTAA GTTTTTGAAT GCATTAACAA TTAAAAAAAA AACTACTAAT ATCGGTAATA CACCGCAAGT CAATCTTCTC CAGCGATTCT AAAAAAGCGT TTGGTTCCCT TCAAAGAAGG AGATAATCTT CACATATTTA CCATTATTCA GAAAGAGAGT CATACTTATT AGACTATGAA CTATCAGGTT GAATGGGTTT AATTGGAACA GCAGTTGCAG TAGAGACTAT TAGCAGTGGT ATTCGACCAC GAAGACAAGA

AGATAACAGT'TGTTTTTATA

CTTTATTATA CAATCTTTTT GCTGCACAAC GAAATTAAAT CTCAAAGAGT GATTTATATG GAAATTTACT TACAGTTCGC TGGAAAAAGA ACTATCCGAC AAACCTTTAG GTCATTACCT AAAATTTTGG TTCACTCTGC AGAACCAACT ATCTAATAAT CCATTTTGAT CTCAGCCGCT ATTTAGCTTA TGCAGAGTAT TGGCTCCTAC TACAAATGTG AACTATGGTT GAAAAAGGAC

GAGGTGCTCA

CTAACAAAAA

GTACTGAAGA

ACAAAGTTGA

ATGGTGACAA

ATTTGTATTT

ATTCTAACTC

ACATGTGTTC

GTCCCATTTT

AAAAAAAATC

TTCAAAAGAA

CCAATAAAGA

ACTATAGATA

TTACTGCAGC

CAAGATTGTC

CATTCAGTAA

CTGTCTTTGT

ATTCACTCCG

AAAATTCATG

TCTTTGACAG

AATGAAGATA

TTCGATGAAA

GACATGGTAG

ATCTTGGAAT

CCGCAAATAC

TCTGAAATCT

CCTCGATCGA

ACATTGAAAA

ACTGCCATAA

TTAACAGGAA

AGGGTTGCCT

GAAAAGATGA

GGAACAGGTC

GTCAAGAACG

GTTTGGTAAG

GACTACTGAT

TGATGATGAA

TGCGGCAGTT

TTGGGAAGTT

ACTGTTTGCT

TATTATATAA

AAGTGTATTT

CCTTTCGTTT

TAAATAATAC

GAAGCATGAC

GGCACTCAAA

ATGAAAAGTG

AACTTTATGG

AACAAGAGAT

TTCTCGTGGG

TGCALACAATC

AACAAACAGC

GChGTGAAGA

AAGTGTTTGA

GTGGTGAGGT

TTGATACATT

AACATTCTCG

ATTTAGAAAA

AGAATCTAGA

CCCCCTGGGT

ATTTGAATAG

ATAGCATAAT

AATCGTGTTC

GGCTCCAATC

TAAGGGCGALA

TTAAAGCTAT

AAAGTACTTT

TTTGGGAAAA

CCTCAACAAA

GAGCAACCALA

GATGAAGAGG

GAAAAGCTAC

GTTGGTALAGA

TTTTTTCTTT

TTAAAAAAAA

TTGGATTTAT

TTGGAAAGTT

TGATAGAAAT

TATALAGCGAA

CGAAGAGGTA

TAAAGACAGC

CACACTTCCT

CGATAGAATC

GCCCAGACALA

TTATAAAGAG

TATTAACGGT

AAAAATTGAC

GAAAATTCTT

TGACAGAGAG

TGCTGGGCCT

GGTACCTGTT

GAGGGTAAAG

CGATATGGTT

TTCACAATGG

ACATATTAGG

TCCATTAGTA

TTTTCTTGAC

TTTATCCGAT

AGACGGATCT

CAACTCCAGA

TTCTATCGAC

TTTAGTGCAA

CCAAAAATAC 180 AAATCCAGALA 240 CTGAAAAGCA 300 AGGATAAAAC 360 AATAGAGTGT 420 CTTGTTTTTC 480 AATAACCATA 540 CATTTTTCTA 600 CTAAGAAAAA 660 ATCAAATATA 720 GCTCGTCTAT 780 GAGGAGACTG 840 GACCCTGGTT 900 ACGGACGAAA 960 ATTAAACAAT 1020 AGTTACAAAA 1080 CAGTTTATAA 1140 ATAGCAACTC 1200 GATACTTCAT 1260 TTACTCTTAG 1320 AGTATAACAA 1380 GTGAGGCAAA 1440 TGCATTTTTG 1500 AGTAGATTTT 1560 GACGCCGTCA 1620 AATGAAACGT 1680 ATGAATTTCG 1740 GCGACATCCA 1800 ATATACAATA 1860 TTAGAGTTAG 1920 TTTAATTTTA 1980 ATTCCCACCG 2040 AATACTATCA 2100 CTGGATTTTT 2160 WO095/16694 PCT1US94/14563 TTACCGAGAA ATCAGCCGTT GGTTTGAGAG ATAATGCGAC CGCAGCATTT TACGCTAGCA 2220 ATTATCAATT TCAGGGCACC ATGATCCCGT TTGACTTGAG AAGTTACCAG ATGCAGATCA 2280 TTCTTCAGGA ATTAAGAAGA ATTATCCCCA AATCTAATAT GTACTACTCC TGGACACAAC 2340 TGTGAATCTT GGGAACAATA TACAGACATT TTATTGGCGG TAGCAACTCT GATATTCCAC 2400 TGTT 2404 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 529 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met i Ile Lys Phe Pro Giu Ser Asp Thr Gly 145 Giu Leu Lys Lys Pro 225 Thr Ile Ser Glu Ala Arg Leu Ser Pro Gin Val Asn Leu Leu Pro Lys Arg His Lys Arg Thr Giy Ser Leu Thr Asp Glu Ile Asp Arg Val Ile Leu 100 Tyr Glu Leu 115 Ile Arg Leu 130 Ile Ala Thr Giu Lys Ile Thr Giu Val 180 Thr Arg Asn 195 Ile Thr Val 210 Val Arg Gin Ser Asn Giu Giu Ile Gin Lys Ile Vai Ser Asn Gin Asp 165 Phe Giu Val Thr Asp Arg Ile 70 Ile Giy Leu Gly Leu 150 Asp Glu Asp Phe Leu 230 25 Lys Leu Thr Ser Gin i105 Gin His Gin Leu Leu 185 Giu Asp Asn 10 Giu Giu Thr Aia Ala Ile Leu Cys Lys Asp Ser Asp Pro Gly Gin Gin Leu Tyr GLy Thr Leu Tyr Leu Gin Asp Cys Gin Gin 75 Ile Ile Gin Lys Giu Ser His 90 Ser Tyr Lys Thr Tyr Leu Leu 110 Ser Tyr Lys Giu Gin Phe Ile 125 Ser Giu Gin Thr Ala Ile Asn 140 Leu Gin Lys Ile His Gly Ser 155 160 Giu Thr Ile Ser Ser Gly Ser 170 175 Leu Leu Leu Asp Ser Thr Thr 190 Val Asp Arg Giu Ser Ile Thr 205 Giu Ile Asp Thr Phe Ala Gly 220 Leu Phe Asp Met Val Giu His 235 240 111111~----- WO 95/16694 Ser Arg Val Pro Val Cys Ile Phe C 245 Leu Glu Tyr Leu Glu Lys Arg Val I 260 1 Ile Tyr Met Pro Gin Ile Gin Asn I 275 280 Arg Asn Leu Leu Thr Val Arg Ser 290 295 Trp Asn Glu Thr Leu Glu Lys Glu I 305 310 Asn Arg His ile Arg Met Asn Phe 325 Leu Lys Asn Ser Ile lie Pro Leu T 340 Ser Leu Cys Thr Ala Ile Lys Ser 355 360 Lys Asn Gin Leu Ser Asn Asn Leu 370 375 Asp Leu Glu Leu Ala Ile Leu Ile 385 390 Ala Lys Asp Gly Ser Phe Asn Phe i 405 Lys Met Ile Lys Ala Ile Asn Ser 1 420 Thr Asn ,Val Gly Thr Gly Gin Ser 435 440 Lys Leu Trp Leu Lys Lys Asp Val i 450 455 Gin Leu Asp Phe Phe Thr Glu Lys 465 470 Ala Thr Ala'Ala Phe Tyr Ala Ser 1 485 Ile Pro Phe Asp Leu Arg Ser Tyr 500 Leu Arg Arg Ile Ile Pro Lys Ser 515 520 Leu INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 2306 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA Thr Phe Met Pro 300 Pro Arg Ser Leu Leu 380 Arg Tyr Thr Ile Trp 460 Gly Phe Ile Tyr PCT/US94/14563 Lys Leu Asn Ile 255 Ser Gin Arg Val 270 Val Asp Ala Val 285 Trp Val Ser Gin Arg Ser Asn Leu 320 Ser Leu Pro Thr 335 Lys Asn Phe Gly 350 Asp Ile Tyr Asn 365 Gin Ser Leu Ser Val Ala Leu Arg 400 Ala Glu Tyr Glu 415 Val Ala Pro Thr 430 Asp Asn Thr Ile 445 Glu Asn Leu Val Leu Arg Asp Asn 480 Gin Gly Thr Met 495 Ile Leu Gin Glu 510 Ser Trp Thr Gin 525 WO 95/16694 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: PCTIUS94/14563 GCTATTTTTT CATGCGTCAG CTGATTCAAA AACTACGTTC AGATGTCATA AAGTGAAACG GAAGGAACTC TCCCCAGAGA AAGTATTPACG GTTTTGATAT AGTATATAAC GCGAGGTTCA AAGGTAAGGA AAAAGAGTAT AAATTAGCCC TTGAACATAA TTTTATTTTC TTCTTAATAC TTTAGGGAAT ATCAAACCAA CCTTCAAATT TAATCTTGCA TATTTTAATG CGAATCCPAA TGGAAGCCCT TACTGCAGGC CCAAACATTC CCACCACAGA AAGACGTTGC ACAATATTTT TTGATATTGG ATGGTTTCGA ATCAAACTAA ATGAATTACT ATGTTAGAGA CATCATTTTT CCGAGGTATA ATGTGGACGA ATGGAAGATT CTTGTCTACG GATCAATTTC AAAATGTAGC TATACTGGAA ACGACATATT GTATCTCGCA TTACTAAGGA AAACTATTTT TAAGCACAGA ACCACAAATC GTGATGACCT TATCTGCTCA TAGCCTCATA TTCTCTAGGA AAACACGTAT GAAGTTAACC CTAGATATTT TTCCAAGCTA TATTCCCTAT GAGGAATCCT TAATGAA.AGC TTGAAATTAA TAGCTACAAC TGGAAAGTAA ACGTTCCCTG

ATGTCACAAA

TGATATCGAA

TGACTATATG

AAGCATTGTT

ATTATTGAGT

ATGGCCTCTT

TTTCAATTCG

TTAACACTCT

TTTTGGAAAT

CTGTCTCGCA

AGGTTATAGT

TTTGCATGCA

GATAGCACGT

TTACGATCCT

TGTCCAATAT

TAGTTTACAA

TCCAAAAGAT

GCAAAGATAT

AGTTTCTACT

TAAGCGTATC

TGCGAACTTC

CGC-ATTGAAT

AAACATATTT

TGATAATTTA

TGAGAACAGT

TATTTGTTCA

CATACAAGGT

ACAGCCTTCT

TCAAGGTAAG

GAATATCGAG

CATGAACAAG

GGAAATTATT

GCCTTTAATC

TCCTATTTAA

AGTGCCCACT

CAAAAGTTAA

TGTATTGCTG

TACCATGAAA

TTTCTGA1ACA

TCTTTGATAT

AAAATGAATG

TCGTATATTT

GGAACAGGAA

GTATGGCTGG

ACTGTACAAT

TTACAGGTTG

GAATCTTTGC

GATTTAGACG

TCTAAAATTA

TCTACACATT

ATATTAGTGA

ATTGAAGAGC

ATTCACTTAA

GACTTGATAG

GAACCACTGG

AGTGAAAATG

CAAACTTACG

TATCTGGAAC

AGAGCTGCTT

TTATTTGCTA

GCGGAGAGTG

GTTTTTCAAA

AATATCGACT

AAAGAAATAT

AAGTATTGTT

TTCAAGATTT

GTCCATGTGC

ATGTGTTTAA

ATTTGACCAT

TATAAATATA

TTAAATCRCA

TGACCACTCC

CTGCTGATCC

AAACCTACAC

AACCTGTTGA

ATAAATTGAA

AAGAGCCATT

AAGAAAAGAC

CCGCACTGTT

ATATAAAATT

GCATTCCAAC

TGTCTAGATG

AGATAACGGA

TTGTGCAGGC

ACTTCAAATG

CTCTTTACAA

GACAAGGTGA

ACTTATCAAT

CTAGATACGA

ATGGACGA~AG

TTGAAAGACT

GTTCCCTATC

ATTTATCCGA

ATTTGAGTCC

C?.GAATCTGT

GCAAGAACAC

GAGATGCTCC

CGGCGCGTGG

GCAAGTAGCC

ATGAGTAAGC

AAAAAAAAAA

AATAACCGAA 420 AGTACTTTTC 480 GGAAGTTGCT2 540 AGACATAACT 600 TTTGAAGAAG 660 GTTGGTTTCT 720 AACCCTATAT 780 TCTTTTGGTA 840 TTGCTTGTTC 900 TAACAAATAT 960 CATTTACACG 1.020 TGTTATGTTT 1080 TGGCGAACTC 12.40 CTGTACAGAC 1200 TTTTCATTCT 1260 GCCCAAGTAT 1320 AAGTGCCATC 1380 AAGCGCGATA 1440 AATTTCGAAG 1500 TGCGAGTATT 1560 AAAGAAGAAA 1620 TTTGGCTATT 1680 TGCACTTCGT 1740 ATTGCATACA 1800 TAAAGTCAGG 1860 TCATTTCAAT 1920 ATCAGCGATT ACTTCAGCGA TATTCACGAA TGATTATCTC CCTGGAAGGT ATCCAGAGGG 1980 -4 WO 95116694

CAGGATACGT

GAACAATTAT

ATATGGTAAT

AAAAGCGCCC

AGTTTAGTTG

ATTGAGCCTG,

FCTIUS94/14563

TCGAAACAAC

CAAGTAAACC

CAAAGATTAA

TACTGTATGG

CTCTTTTTGG

ATTTTGTTTT

AACTACGTTA

TTGTATTTTT

TACGTATAAC

AAAAACAATG

CGGCCGGCGA

GTCTTA

TATAAATATT TATACATAGT TGTTCCCACG CTCTACGCTC CGTTATTAAT TCAGTCCACT AATGAGGAGA CTGAACGGCG TAATGTTCTT CACTTGGTAT

GGGATAGAAT

TGTTTCTTGG

AGAAACTATT

CAAAATTGTT

TCTTACCAGG

2040 2100 2160 2220 2280 2306 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 479 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Val Thr Pro Giu Val Ala Phe Pro Arg Giu Tyr Gin Thr Asn Cys Leu Ala Leu Ile Leu Ser Tyr Ile Ser Ala Asp 25 Thr Asp Ile Thr Thr Leu Lys Gin Gly Tyr Ser Gly Lys Thr Lys TIyr Val Glu Phe Asn Ala Leu Val Ser Asn Pro Trp Lys Lys -Thr Asn Leu His Ala Trp Leu Giu Pro Pro Leu Leu Leu Tyr Pro 90 Ile Ala Arg Gin Tyr Lys Leu Ile Pro Thr Thr Asp Tyr Asp Pro His Asn Ile 115 Gin Val Giu Glu Phe Leu Leu Val Vai Gin Tyr Ser Leu Gin Giu Lys Thr Leu 110O Thr Cys Leu Asp Aia.Ala Phe Leu 130 Leu Phe Ile Leu Asp Gly Ser Leu Gin Asp Leu Asn Lys Tyr Lys Leu Asn Giu 140 Leu Giu Ile Asn Ile Ile Tyr Thr Pro Lye Asp Ser 160 Thr Ser Phe Leu 175 Phe Pro Arg Tyr 190 Arg Cys Gly Giu Gin Arg Tyr Ser 180 Asn Val Asp Giu 195 Thr His Cys Ile Val Met Val Ser Thr Leu Val Met Ser Leu Met Giu Asp Ser Cys Leu Arg Lys Arg Ile Ile Giu Giu Gin Ile

I

WO 95116694 PCT/US94/14563 Thr Asp Cys Thr Asp Asp Gin Phe Gin Asn 225 230 His Leu Ile Vai Gin Ala Phe 245 Ala Leu Asn Asp Leu Ile Asp 1 260 Ile Thr Lys Giu Asn Ile Phe 275 Ile Lys Leu Phe Leu Ser Thr 2 290 295 Giy Giu Ser Aia Ile Thr Thr 1 305 310 Thr Tyr Asp Leu Ser Ile Ile 325 Ile Cys Ser Tyr Leu Giu Pro 340 Lys Thr Arg Ile Ile Gin Gly 355 Lys Giu Val Asn Pro Arg Tyr 370 375 Arg Leu Leu Ala Ile Phe Gi 385 390 Giu Ser Gly Ser Leu Ser Ala 1 405 Asn Ile Giu Val Phe Gin Asn 1 420 40 Ile Ala Thr Thr Met Asn Lys 435 Arg Trp Lys Val Asn Val Pro 450 455 Ser Val His Phe Asn Ile Ser 465 470 INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS Tyr 250 Trp Leu Asn Asp Tyr 330 Asp Ala Pro Phe G lu 410 Giu Asp Ile Val Ala Ala 235 Thr Gly Asn Pro Lys Tyr Ala Leu Tyr 285 Leu Ser Giu 300 Asp Leu Giu 315 Leu Leu Ile Ala Ser Ile Tyr Gly Arg 365 Ser Leu Phe 380 Pro Ile Gin 395 Giu Ser Leu Leu His Thr Tyr Leu Ser 445 Ile Lys Giu 460 Ile 240 Phe Arg Ala Gin Gin 320 Tyr Arg Lys Glu Ala 400 Leu Val Glu 14 Tyr Phe Ser Asp Ile His Glu (ix) LENGTH: 1975 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOL(.Y: linear MOLECULE TYPEz cONA

FEATURE:

NAME/KEY: CDS LOCATION: 443. .1747 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: CGTGTGCTCT TCTATAGTAA TTTGACATTC TCTAAACGCA GAGACCTCTT ATAAla ATTC AACAAATAAG GAATGTTACC TATGCTAGTC GCAACTCTCT CGTAAGTTGA GGGTTGCTAA '120 WO 95/16694 CAGAAAAACG ATGAGAAGAA ACTTTTGAAA TATGAAPAAA GAATGCGGGC GTCCGTAAAG AGGCTTTCGA ATACACTCCT CACGCTTCTC CTGTGTATTT CTTTGTTCTT TGCCGTTGTT AAAATCTCAC ACTAAAATTG CAGAAAAAAG AACAATACCA TTAAAACCAG TC ATG TCC PCT/US94/14563 AATATTGTGT GAAAGCAGCA CGAAACAGAG AGCTAGAATC GCAAGTGTCC AGAATATGCA TTCAGCAAAA ATCAACTCTT TGTGATAAAA TACGTTAGTA AGAAATCGGC ATTGAAAAAA TGTACAATAT CAGTAAATAA AATTGGCCAA ATG CAA CAA GTC CAA CAT TGT GTC 2 Met Ser Met Gin Gin Val Gin His Cys Vai 1 5 GAT CCA CAA GAA AAA CCG GAC TGG TCG AGC 15 GCA Ala

GGA

Gly

TCA

Ser

CAC

Hisr

CCT

Pro 75

GCC

Ala

CCT

Pro

AGT

Ser

TTG

Leu

AAT

Asn 155

GAA

Glu

GAT

Asp

CAT

His

GAA

Giu TA Tyr

CTG

Leu

ATA

Ile

GAC

Asp

AAG

Lys

ATG

Met

CCA

Pro

AGG

Arg 140

AAT

Asn

TCG

Ser

GAG

Giu

AGT

Ser

GTA

Val

TTG

Leu

AAC

As n 45

TGT

Cys

CTT

Leu

CAT

His

AAA

Lys

GTA

Val 125

AAT

Asn

GAT

Asp

CCG

Pro

GAT

Asp

AAT

Asn

CTT

Leu

AAG

Ly s

AAG

Lys

GCA

Aia

TGC

Cys

TTA

Leu

CAA

Gin 110

AAG

Lys

CAA

Gin

TCG

Ser

TCT

Ser

GAA

Giu 190

AAG

Lys

CGA

Arg

AAG

Lys

GTA

Val1

TAC

Tyr

TAT

Tyr

ATG

Met

TTT

Phe

AAC

Asn

CTG

Leu

TTC

Phe

ATT

Ile 175

GAG

Glu

AGC

Ser

CTA

Leu

TTG

Leu

ATG

Met

ATA

Ile

TAT

Tyr 80

AAC

Asn.

GCT

Aia

GGT

Gly

TTT

Phe

GTA

Val 160

ACT

Thr

GAA

Giu

ATT

Ile Asp

ACT

Thr

CTG

Leu

GCG

Ala 65

ATA

Ile

CTT

Leu

TGG

Trp

GGG

Gly

GGT

G iy 145

ATA

Ile

AGG

Arg

CCA

Pro

ACT

Thr Pro

AAT

Asn

AAA

Ly s 50

TCA

Ser

GAC

Asp

TTC

Phe

ACA

Thr

AGG

Arg 130

ACA

Thr

CCA

Pro

AGA

Arg

GGA

Gly

GGA

Giy 210 Gin

GCG

Ala 35

CAA

Gin

CAG

Gin

AGT

Ser

AGA

Arg

CCG

Pro 115

TTT

Phe

CCA

Pro

GAA

Giu

AAG

Lyb-

AAC

Asn 195

ACC

Thr

ACA

Giu 20

ACA

Thr

GAT

Asp

AAA

Lys

ATT

le

CAA

Gin 100

AGC

Ser

ACT

Thr

ACT

Thr

CTA

Leu

TTA

Leu 180

GAC

Asp

AGA

Arg

AGT

Lys

TCG

Ser

GAA

G iu

ATG

Met

CCC

Pro 85

AGT

Ser

CCC

Pro

TCT

Ser

AAA

Ly s

CCC

Pro 165

GCA

Aia

GGT

Giy

AAT

Asn

GAG

Pro

ATT

Ile

GAG

G iu

AAT

Asn

TTG

Leu

TTA

Leu

AAA

Lys

TCT

Ser

GTT

Vai 150

CCC

Pro

TTT

Phe

TTG

Leu

GTA

Val

GAA

Asp

TTA

Leu

GTT

Vali

GAA

Giu

GAO

Giu

TCT

Ser

AAG

Lys

GAT

Asp 135

AGG

Arg

ATG

Met

GAA

Giu

TCT

Ser

GAT

Asp 215

GAG

Trp

TAT

Tyr

GCT

Aia

AAA

Lys

CCG

Pro

AAT

Asn

AAC

As n 120

CCG

Pro

AAA

Ly s

CAA

Gin

GAG

Giu

TTA

Leu 200

TCT

Ser

CCA

Ser Ser AAT ACT Asn Thr AGA TGT Arg Cys CAC ATG His Met AAA AAA Lys Lys TCT TCA Ser Ser 105 AAA CGC Lys Arg AAA GAG Lys Giu AGC CAA Ser Gin ACC AAT Thr Asn 170 GAT GAG Asp Giu 185 AAA AGC Lys Ser GAT GAG Asp Giu TTA GGT 520 568 616 664 712 760 808 856 904 952 1000 1048 1096 1144 205 TAT GAA AAC CAT GAA AGT GAC CCT

:RI

WO 95/16694 PCTIUS94/14563 Tyr

GTG

Val 235

AAA

Lys

AGA

Arg

GAA

Glu

GCA

Ala

TGC

Cys 315

GTA

Val

GTC

Val

ATT

Ile

TTC

Phe

GAT

Asp 395

ATT

Ile

ATG

Glu 220

CAA

Gin

CCG

Pro

ATA

Ile

GAA

Glu

TAT

Tyr 300

CCA

Pro

TTT

Phe

AGT

Ser

GAA

Glu

AGA

Arg 380

GAA

Glu

TTG

Leu

GAT

Asn

GAA

Glu

CAA

Gin

CCA

Pro

ATA

Ile 285

AAA

Lys

TGG

Trp

AAT

Asn

AAG

Lys

TGT

Cys 365

GAT

Asp

ATT

Ile

GTC

Val

TTG

His

AGC

Ser

TCA

Ser

AAT

Asn 270

ATA

Ile

ATT

Ile

CAA

Gin

GAA

Glu

ATG

Met 350

GTA

Val

TTG

Leu

ATA

Ile

ACA

Thr

GCA

Ala 430 Glu Ser AGA AGC Arg Ser 240 GAA TTG Glu Leu 255 TCT TTG Ser Leu CGG CTT Arg Leu GTG GAT Val Asp TTA GTG Leu Val 320 AGA AGA Arg Arg 335 TGC AGC Cys Ser AAA TTA Lys Leu CAA ATT Gin Ile TTT AGG Phe Arg 400 GAC GAC Asp Asp 415 TTA ACA Leu Thr Asp 225

GGG

Gly

AAG

Lys

TTA

Leu

TGC

Cys

GAG

Glu 305

TGT

Cys

CGC

Arg

TTG

Leu

GTG

Val

AGG

Arg 385

AAA

Lys

CAG

Gin

GAA

Glu Pro Thr Ser Glu Glu Glu Pro Leu Gly

AGA

Arg

ACG

Thr

GTA

Val

AAC

Asn 290

TAC

Tyr

GGG

Gly

AAG

Lys

ATG

Met

AAG

Lys 370

TAT

Tyr

CTG

Leu

TAC

Tyr

CCT

230 ACG AAA CAA AAT AAG GCA GTT GGA Thr Lys Gin Asn Lys Ala Val Gly 245 250 GCA AAA GCC CTG AGG AAA AGG GGC Ala Lys Ala Leu Arg Lys Arg Gly 260 265 AAG AAG TAT TGC AAA ATG ACT ACT Lys Lys Tyr Cys Lys Met Thr Thr 275 280 GAT TTT GAA TTA CCA AGA GAA GTA Asp Phe Glu Leu Pro Arg Glu Val 295 AAC ATA AAC GCG TCA AGA TTG GTT Asn Ile Asn Ala Ser Arg Leu Val 310 TTA GTA TTA AAT TGT ACA TTC ATT Leu Val Leu Asn Cys Thr Phe Ile 325 330 GAT CCA AGA ATT GAC CAT TTT ATA Asp Pro Arg Ile Asp His Phe Ile 340 345 TTG ACG TCA AAA GTG GAT GAT GTT Leu Thr Ser Lys Val Asp Asp Val 355 360 GAA TTA ATT ATC GGT GAA'AAA TGG Glu.Leu Ile Ile Gly Glu Lys Trp 375 GAT GAT TTT GAT GGC ATC AGA TAC Asp Asp Phe Asp Gly Ile Arg Tyr 390 GGA TCG ATG TTA CAA ACC ACC AAT Gly Ser Met Leu Gin Thr Thr Asn 405 410 AAT ATT TGG AAG AAA AGA ATT GAA Asn Ile Trp Lys Lys Arg Ile Glu 420 425 TTA TAACATATCC AGTATTAACT 1192 1240 1288 1336 1384 1432 1480 1528 1576 1624 1672 1720 1767 1827 1887 1947 1975 Met Asp Leu Pro Leu 435 AAAAGTATAT ATTTGACCAA TACCTGACAT ATCTTCTAAA CGAGCTAATG TTAGCTCCAT CTTTGCACTT ATGATTGGAT ATCTTTGCAG CTTCCGCGAA GGTAGTAGCT TGAAGTTTTT ATTGCAGAAT CTTCAAACAA TTCTATGG INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 435 amino acids GCATGCCTTT AGCCCTATAA CAGCCCTCAA ACGCTTTTGT CATCCATAGT TCTTGCTAAA I L i WO 95/16694 PCTIUS94/14563 TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Met Ser Met Gin Gln Vai Gin His Cys Val Ala Glu Val Leu Asp Thr Leu Ala Ile Leu Trp Gly Gly 145 Ile Arg Pro Thr Asp 225 Lys Leu Cys Glu 305 Pro Asn Lys Ser Asp Phe Thr Arg 130 Thr -Pro Arg Gly G ly 210 Pro Arg Thr Val Asn 290 Gin Ala 35 Gin Gin Ser Arg Pro 115 Phe Pro Giu Lys As n 195 Thr Thr Thr Ala Lys 275 Asp Glu Thr Asp Lys Ile Gin 100 Ser Thr Thr Leu Leu 180 Asp Arg Ser Lys Lys 260 Lys Phe Lys Ser Giu Met Pro Ser Pro Ser Lys Pro 165 Ala Gly Asn Giu Gin 245 Ala Tyr Glu Pro Ile Giu Asn 70 Leu Leu Lys Ser Val 150 Pro Phe Leu Val Glu 230 Asn Leu Cys Leu Ala 310 Asp Trp Ser Leu Tyr Asn Val Ala Arg Glu Lys His Giu Pro Lys Ser Asn Ser 105 Lys Asn Lys 120 Asp Pro Lys 135 Arg Lys Ser Met Gin Thr Giu Giu Asp 185 Ser Leu Lys 200 Asp Ser Asp 215 Giu Pro Leu Lys Ala Val Arg Lys Arg 265 Lys Met Thr 280 Pro Arg Glu 295 Ser Arg Leu Gly Ser His Pro 75 Ala Pro Ser Leu Asn 155 Giu Asp His Tyr Val 235 Lys Arg Giu Ala Cys 315 Arg Leu Lys Leu Val. Met Tyr Ile Tyr Tyr Met Asn Phe Ala Asn Gly Phe Val 160 Thr Giu Ile Ser Ser 240 Leu Leu Leu Asp Val 320 Tyr Asn Ile Asn WO 95116694 PCTIUS94/14563 Cys Gly Leu Val. Leu Asn Cys Thr Phe Ile Val Phe Asn Giu Arg Arg 325 330 335 Arg Lys Asp Pro Arg Ile Asp His Phe Ile Val Ser Lys Met Cys Ser 340 345 350 Leu Met Leu Thr Ser Lys Vai Asp Asp Val Ile Giu Cys Vai Lys Leu 355 360 365 Vai Lys Giu Lau Ile Ile Gly Glu Lys Trp Phe Arg Asp Leu Gin Ile 370 375 380 Arg Tyr Asp Asp Phe Asp Giy Ile Arg Tyr Asp Gu Ile Ile Phe Arg 385 390 39 15 400 Lys Leu Gly Ser Met Leu Gin Thr Thr Asn Ile Leu Val Thr Asp Asp 405 410 415 Gin Tyr Asn Ile Trp Lye Lye Arg Ile Giu Met Asp Leu Ala Leu Thr 420 425 43r Giu Pro Leu 435

Claims

1. An isolated nucleic acid encoding an origin of replication (ORC) polypeptide selected from the group consisting of ORCI, ORC3, ORC4, ORC5 and ORC6.

2. The isolated nucleic acid of claim 1, wherein the ORC polypeptide is ORC1.

3. The isolated nucleic acid of claim 1, wherein the ORC polypeptide is ORC3.

4. The isolated nucleic acid of claim 1, wherein the ORC polypeptide is ORC4. The isolated nucleic acid of claim 1, wherein the ORC polypeptide is

6. The isolated nucleic acid of claim 1, wherein the ORC polypeptide is ORC6.

7. An isolated nucleic acid of comprising at least 36 nucleotides of a natural ORC transcript which specifically hybridises with said transcript, said transcript selected from an ORC1, ORC3 and ORC6 transcript. 20 8. The isolated nucleic acid of claim 7, wherein said transcript is an ORC1 transcript.

9. The isolated nucleic acid of claim 7, wherein said transcript is an ORC3 transcript.

10. The isolated nucleic acid of claim 7, wherein said transcript is an ORC6 transcript.

11. A method of making a ORC polypeptide, comprising the step of expressing a nucleic acid according to claim 1.

12. A method of identifying an agent which specifically binds an ORC polypeptide, comprising the steps of: i) expressing a nucleic acid according to claim 1 to produce an ORC polypeptide; ii) contacting a candidate agent with said ORC polypeptide; and Ii.\PCarke\eep\speci\137395coldspri t.9,doc 12/11/92 -67 iii) measuring the binding affinity of said agent I for said ORO polypeptide. Dated this 12th day of November 1998 COLD SPRING HARBOR LABORATORY and THE REGENTS OF THE UNIVERSITY OF CALIFORNIA By their Patent Attorneys GRIFFITH HACK Fellows Institute of Patent Attorneys of Australia ~r H:XPCarke\Keep\specis\1373 4.95 .coldgpring. regents. =c.doc 12/11/98