AU754024B2

AU754024B2 - Haemophilus adhesion proteins

Info

Publication number: AU754024B2
Application number: AU47162/00A
Authority: AU
Inventors: Stephen J. Barenkamp; Joseph W. St. Geme
Original assignee: University of Washington; St Louis University; Washington University in St Louis WUSTL
Current assignee: St Louis University; Washington University in St Louis WUSTL
Priority date: 1995-03-24
Filing date: 2000-07-12
Publication date: 2002-10-31
Anticipated expiration: 2016-03-22
Also published as: AU4716200A

Description

P:\OPERUEH47162-00 spe.doc-29/0802 -1- HAEMOPHILUS ADHESION PROTEINS FIELD OF THE INVENTION The invention relates to novel Haemophilus adhesion proteins, nucleic acids, and antibodies.

BACKGROUND OF THE INVENTION Most bacterial diseases begin with colonization of a particular mucosal surface (Beachey et al., 1981, J. Infect. Dis. 143:325-345). Successful colonization requires that an organism overcome mechanical cleansing of the mucosal surface and evade the local immune response. The process of colonization is dependent upon specialized microbial factors that promote binding to host cells (Hultgren et al., 1993 Cell, 73:887-901). In some cases the colonizing organism will subsequently enter (invade) these cells and survive intracellularly (Falkow, 1991, Cell 65:1099-1102). Haemophilus influenzae is a common commensal organism of the human respiratory tract (Kuklinska and Kilian, 1984, Eur. J. Clin. Microbiol. 3:249-252). It is the most -2common cause of bacterial meningitis and a leading cause of other invasive (bacteraemic) diseases. In addition, this organism is responsible for a sizeable fraction of acute and chronic otitis media. sinusitis. bronchitis, and pneumonia.

Haemophilus influenzae is a human-specific organism that normally resides in the human nasopharynx and must colonize this site in order to avoid extinction. This microbe has a number of surface structures capable of promoting attachment to host cells (Guerina et al.. 1982. J. Infect. Dis. 146:564-. Pichichero et al.. 1982. Lancet ii:960-962: St. Geme et al., 1993, Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879).

In addition. H. influenzae has acquired the capacity to enter and survive within these 10 cells (Forsgren et al.. 1994. Infect. Immun. 62:673-679; St. Geme and Falkow. 1990.

Infect. Immun. 58:4036-4044; St. Geme and Falkow. 1991, Infect. Immun. 59:1325- 1333. Infect. Immun. 59:3366-3371). As a result. this bacterium is an important cause of both localized respiratory tract and systemic disease (Turk, 1984, J. Med.

Microbiol. 18:1-16). Nonencapsulated. non-typable strains account for the majonty 15 of local disease (Turk. 1984. supra); in contrast. serotype b strains, which express a capsule composed of a polymer of ribose and ribitol-5-phosphate (PRP). are responsible for over 95% of cases of H. influenzae systemic disease (Turk. 1982.

Clinical importance of Haemophilu influenzae, p. 3-9. In S.H. Sell and P.F. Wriht Haemophilus influenzae epidemiology, immunology, and prevention of disease. Elsevier/North-Holland Publishing Co., New York).

The initial step in the pathogenesis of disease due to H. influenzae involves colonization of the upper respiratory mucosa (Murphy et al., 1987, J. Infect. Dis.

5:723-731). Colonization with a particular strain may persist for weeks to months.

and most individuals remain asymptomatic throughout this period (Spinola et al..

1986.1. Infect.Dis. 154:100-109). However.in certain circumstancescolonization will be followed by contiguous spread within the respiratory tract. resulting in local disease in the middle ear. the sinuses, the conjunctiva. or the lungs. Alternatively, on occasion bacteria will penetrate the nasopharyngeal epithelial barrier and enter the bloodstream.

In vitro observations and animal studies suggest that bacterial surface appendages called pili (or fimbriae) play an important role in H. influenzae colonization. In 1982 two groups reported a correlationbetween piliation and increased attachment to human oropharyngeal epithelial cells and erythrocytes (Guerina et al.. supra: Pichichero et al., supra). Other investigators have demonstrated that anti-pilus antibodies block in vitro attachment by piliated H. influenzae (Forney et al.. 1992.

J. Infect.Dis. 165:464-470:van Alphen et al.. 1988. Infect. Immun. 56:1800-1806) 10 Recently Weber et al. insertionally inactivated the pilus structural gene in an H.

influenzae type b strain and thereby eliminated expression of pili; the resulting mutant exhibited a reduced capacity for colonization of year-old monkeys (Weber et al.. 1991. Infect. Immun. 59:4724-4728).

A number of reports suggest that nonpilus factors also facilitate Haemophilus 15 colonization. Using the human nasopharyngeal organ culture model. Farley et al.

(1986. J. Infect. Dis. 161:274-280) and Loeb et al. (1988. Infect. Immun. 49:484- 489) noted that nonpiliated type b strains were capable of mucosal attachment. Read and coworkers made similar observations upon examining nontypable strains in a model that employs nasal turbinate tissue in organ culture (1991. J. Infect. Dis.

163:549-558). In the monkey colonization study by Weber et al. (1991. supra).

nonpiliated organisms retained a capacity for colonization, though at reduced densities: moreover, among monkeys originally infected with the piliated strain.

virtually all organisms recovered from the nasopharynx were nonpiliated. All of these observationsare consistentwith the finding that nasopharyngealisolates from children colonized with H influenzae are frequently nonpiliated (Mason et al.. 1985.

Infect. Immun. 49:98-103; Brinton et al., 1989. Pediatr. Infect. Dis. J. 8:554-561).

-4- Previous studies have shown that H. influenzae are capable of entering (invading) cultured human epithelial cells via a pili-independent mechanism (St. Geme and Falkow, 1990, supra; St. Geme and Falkow, 1991. supra). Although H. influenzae is not generally considered an intracellular parasite, a recent report suggests that these in vitro findings may have an in vivo correlate (Forsgren et al.. 1994. supra).

Forsgren and coworkers examined adenoids from 10 children who had their adenoids removed because of longstanding secretory otitis media or adenoidal hypertrophy.

In all 10 cases there were viable intracellularH. influenzae. Electron microscopy demonstrated that these organisms were concentrated in the reticular crypt 10 epithelium and in macrophage-like cells in the subepithelial layer of tissue. One possibility is that bacterial entry into host cells provides a mechanism for evasion of the local immune response, thereby allowing persistence in the respiratory tract Thus, a vaccine for the therapeutic and prophylactic treatment of Haemophilus infection is desirable. Accordingly. it is an object of the present invention to provide for recombinant Haemophilus Adherence (HA) proteins and variants thereof. and to produce useful quantities of these HA proteins using recombinant

DNA

0* techniques.

It is a further object of the inventionto provide recombinant nucleic acids encoding HA proteins, and expression vectors and host cells containing the nucleic acid encoding the HA protein.

An additional object of the invention is to provide monoclonal antibodies for the diagnosis of Haemophilus infection.

A further object of the invention is to provide methods for producing the HA proteins, and a vaccine comprising the HA proteins of the present invention.

Methods for the therapeutic and prophylactic treatment of Haemophilus infection are also provided.

SUMMARY OF THE INVENTION In accordance with the foregoing objects, the present invention provides recombinant HA proteins, and isolated or recombinant nucleic acids which encode the HA proteins of the present invention. Also provided are expression vectors which comprise DNA encoding a HA protein operably linked to transcriptional and translationalregulatory DNA. and host cells which contain the expression vectors.

The invention provides also provides methods for producing HA proteins which comprises culturing a host cell transformed with an expression vector and causing expressionof the nucleic acid encoding the HA protein to produce a recombinant HA protein.

The invention also includes vaccines for Haemophilus influenzae infection comprising an HA protein for prophylactic or therapeutic use in generating an immune response in a patient. Methods of treating or preventing Haemophilus influenzae infection comprise administering a vaccine.

BRIEF DESCRIPTION OF THE DRAWINGS Figures IA. IB. and IC depict the nucleic acid sequence of HAl.

Figure 2 depicts the amino acid sequence of HAl.

Figures 3A. 3B. 3C, 3D. 3E. 3F and 3G depictthe nucleic acid sequence and amino acid sequence of HA2.

Figure 4 shows the schematic alignment of HAl and HA2. Regions of sequence similarity are indicated by shaded. striped, and open bars, corresponding to N-terminal domains, internal domains, and C-terminal domains, respectively. The solid circles represent a conserved Walker box ATP-binding motif (GINVSGKT).

Numbers above the bars refer to amino acid residue positions in the full-length proteins. Numbers in parentheses below the HA2 bars represent percent similarity/percent identity between these domains and the corresponding HAl domains. The regions of HA2 defined by amino acid residues 51 to 173. 609 to 846. and 1292 to 1475 show minimal similarity to amino acids 51 to 220 of HA1.

10 Figure 5 depicts the homology between the N-terminal amino acid sequences of HAl and HA2. Single letter abbreviations are used for the amino acids. A line indicates identity between the residues. and two dots indicate conservativechanges.

i.e. similarity between residues.

Figure 6 depicts the restriction maps of phage 11-17 and plasmid pT7-7 subclones.

Figure 7 depicts the restrictionmap of pDC400 and derivatives. pDC400 contains a 9.1 kb insert from strain C54 cloned into pUC 19. Vector sequences are represented by hatched boxes. Letters above the top horizontal line indicate restriction enzyme sites: Bg, BglIl: E. EcoRl: H. HindIII; P, PsiI; S. Sall: Ss. SstI: X.Xbal. The heavy horizontal line with arrow represents the location of the hsflocus within pDC400 and the direction of transcription. The striated horizontal line represents the 3.3 kb intragenic fragment used as a probe for Southern analysis. The plasmid pDC602.

which is not shown. contains the same insert as pDC601. but in the opposite orientation.

Figure 8 shows the identification of plasmid-encoded proteins using the bacteriophage T7 expression system. Bacteria were radiolabelled with trans- 3 S]-label. and whole cell lysates were resolved on a SDS.polyacrylamidegel. Proteins were visualized by autoradiography. Lane 1.

E. coli BL21 (DE3)/pT7-7uninduced; lane 2, BL21 (DE3)/pT7-7 induced: lane 3, BL21(DE3)/pDC602 uninduced; lane 4, BL21(DE3)/pDC602 induced; lane BL21(DE3)/pDC601 uninduced; lane 6, BL21(DE3)/pDC601 induced. The plasmids pDC602 and pDC601 are derivatives of pT7-7 that contain the 8.3 kb Xbal fragment from pDC400 in opposite orientations. The asterisk indicates the overexpressed protein in BL21(DE3)/pDC601.

Figure 9 depicts the southern analysis of chromosomal DNA from H. influen:ae 10 strains C54 and 11, probing with HA2 versus HAl. DNA fragments were separated on a 0.7% agarose gel and transferred bidirectionallyto nitrocellulose membranes prior to probing with either HAl or HA 2. Lane 1. C54 chromosomal DNA digested with BgllI; lane 2. C54 chromosomal DNA digested with Clal; lane 3. C54 chromosomal DNA digested with Pstl: lane 4. 11 chromosomal DNA digested with Bgll: lane 5. 11 chromosomal DNA digested with Clal: lane 6, 11 chromosomal DNA digested withXbal. A. Hybridization with the 3.3 kb Pstl-Bglll intragenic fragment of HA2 from strain C54. B. Hybridization with the 1.6 kb Styl-Sspl intragenic fragment of HA from strain 11.

9 Figure 10 depicts the comparison of cellular binding specificities of E. coli harboring HA2 versus HA Adherence was measured after incubating bacteria with eucaryotic cell monolayers for 30 minutes as described and was calculated by dividing the number of adherent colony forming units by the number of inoculated colony forming units (St. Geme et al.. 1993). Values are the mean SEM of measurements made in triplicate from representative experiments. The plasmid pDC601 contains the HA2 gene from H. influenzae strain C54. while contains the HA 1 gene from nontypable H. influenzae strain 11. Both pDC601 and pHMW8-5 were prepared using pT7-7 as the cloning vector.

Figure 11 depicts the comparison of the N-terminal extremities of HA2. HMW1.

HMW2. AIDA-I. Tsh. and SepA. The N-terminal sequence of HA2 is aligned with those of HAl (Barenkamp. and J.W. St. Geme, III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., in press.), HMW1 and HMW2 (Barenkamp,S.J.. and E. Leininger. 1992. Cloning, expression,and DNA sequence analysis of genes encoding nontypeable Haemophilus influenzae high molecular weight surface-exposedproteins related to filamentoushemagglutininofBordetella pertussis. Infect. Immun. 60: 1302-1313.). AIDA-1 (Benz, and M.A. Schmidt.

10 1992. AIDA-1. the adhesin involved in diffuse adherence of the diarrhoeagenic Escherichiacoli strain 2787 (0126:H27).is synthesizedvia a precursor molecule.

Mol. Microbiol. 6:1539-1546.). Tsh (Provence. D. and R. Curtiss III. 1994. Isolation and characterizationof a gene involved in hemagglutinationby an avian pathogenic Escherichia coli strain. Infect. Immun. 62:1369-1380.). and Sep A (Benjelloun-Touimi. P.J. Sansonetti. and C. Parsot. 1995. SepA. the major extracellularprotein of Shigella flexneri: autonomous secretion and involvement in tissue invasion. Mol. Microbiol. 17:123-135.). A consensus sequence is shown on the lower line.

Figure 12 depicts the southern analysis of chromosomal DNA from epidemiologicallydistinct strains of H. influenae type b. Chromosomal DNA was digested with Bgll. separated on a 0.7% agarose gel. transferred to nitrocellulose, and probed with the 3.3 kb Pstl-Bgll intragenic fragment of hsffrom strain C54.

Lane 1. strain C54: lane 2. strain 1081; lane 3. strain 1065; lane 4. strain 1058: lane strain 1060; lane 6, strain 1053; lane 7. strain 1063: lane 8. strain 1069; lane 9.

strain 1070: lane 10, strain 1076; lane 11. strain 1084.

Figure 13 depicts the southern analysis of chromosomal DNA from non-type b encapsulatedstrainsofH. influenzae. ChromosomalDNA was digested with BglIl.

separated on a 0.7% agarose gel. transferred to nitrocellulose. and probed with the 3.3 kb Pstl-BgII intragenic fragment of hsf from strain C54. Lane 1, SM4 (type lane 2. SM72 (type lane 3. SM6 (type lane 4. Rd (type lane 5, SM7 (type lane 6. 142 (type lane 7, 327 (type lane 8, 351 (type lane 9. 134 (type lane 10, 219 (type lane 11. 346 (type lane 12. 503 (type f).

Figures 14A and 14B are the nucleic acid sequence of HA3.

Figure 15 is the amino acid sequence of HA3.

Figures 16A and 16B depict the homology between the amino acid sequences of HAl and HA3. Single letter abbreviations are used for the amino acids. A line indicates identity between the residues.and two dots indicate conservative changes.

i.e. similarity between residues.

DETAILED DESCRIPTION OF THE INVENTION The present invention provides novel Haemophilus Adhesion (HA) proteins. In :a preferred embodiment. the HA proteins are from Haemophilus strains. and in the preferred embodiment. from Haemophilus influenza. In particular. H. influenzae encapsulated type b strains are used to clone the HA proteins of the invention.

However. using the techniques outlined below, HA proteins from other Haemophilus influenzae strains, or from other bacterial species such as Neisseria spp. or Bordetalla spp. may also be obtained.

Three HA proteins. HAl. HA2 and HA3. are depicted in Figures 2. 3 and respectively. HA2 is associated with the formation of surface fibrils, which are involved in adhesion to various host cells. HAl has also been implicated in adhesion to a similar set of host cells. When the HAl or HA2 nucleic acid is expressed in a non-adherent strain of E. coli as described below, the E. coli acquire the ability to adhere to human host cells. It should be noted that in the literature. HAl is referred to as hia influenza adherence) and HA2 is referred to as hsf (Haemophilus surface fibrils).

A HA protein may be identified in several ways. A HA nucleic acid or HA protein is initially identified by substantial nucleic acid and/or amino acid sequence homology to the sequences shown in Figures 1.2. 3. 14 or 15. Such homology can 9 be based upon the overall nucleic acid or amino acid sequence or portions thereof.

As used herein, a protein is a "HA protein" if the overall homology of the protein 10 sequence to the amino acid sequence shown in Figures 2 and/or Figure 3 and/or Figure 15 is preferably greater than about 45 to 50%. more preferably greater than about 65% and most preferably greater than 80%. In some embodiments the homology will be as high as about 90 to 95 or 98%. That is. a protein that has at least 50% homology (or greater) to one. two or all three of the amino acid sequences 15 of HAl. HA2 and HA3 is considered a HA protein. This homology will be determined using standard techniques known in the art. such as the Best Fit sequence program described by Devereux et al.. Nucl. Acid Res. 12:387-395 (1984) or the BLASTX program (Altschul et al.. J. Mol. Biol. 215:403-410 (1990)). The alignmentmay include the introductionof gaps in the sequencesto be aligned. As noted below, in the comparison of proteins of different lengths. such as HAl and HA3 with HA2. the homology is determined on the basis of the length of the shorter sequence.

In a preferred embodiment. a HA protein is defined as having significant homology to either the N-terminal region or the C-terminal region. or both. of the HA 1. HA2 and HA3 proteins depicted in Figures 4, 5 and 15. The N-terminal region of about amino acids is virtually identical as between HAl and HA3 (98% homology).

-11and as between either HAl or HA3 and HA2 is 74%. As shown in Figure 11. the first 24 amino acids of the N-terminus of HAl and HA2 has limited homology to several other proteins, but this homology is 50% or less. Thus. a HA protein may be defined as having homology to the N-terminal region of at least about preferably at least about 70%. and most preferably at least about 80%. with homology as high as 90 or 95% especially preferred. Similarly, the C-terminal region of at least about 75, preferably 100 and most preferably 125 amino acid residues is also highly homologous and can be used to identify a HA protein. As shown in Figure 16. the homology between the C-terminal 120 or so amino acids 10 of HAl and HA3 is about 98%. and as between either HAl or HA3 and HA2 is :*to also about 98%. Thus homology at the C-terminus is a particularly useful way of identifying a HA protein. Accordingly. a HA protein can be defined as having homology to the C-terminal region of at least about 60%, preferably at least about and most preferably at least about 80%. with homology as high as 90 or especially preferred. In a preferred embodiment. the HA protein has homology to both the N- and C-terminal regions.

In addition. a HA protein may be identified as containing at least one stretch of :amino acid homology found at least in the HAl and HA2 proteins as depicted in Figure 4. HA2 contains three separate stretchs of amino acids (174 to 608. 847 to 1291. and 1476 to 1914. respectively) that shows significant homology to the region of HAl defined by amino acids 221 to 658.

The HA proteins of the present invention have limited homology to the high molecular weight protein-I (HMW1) of H. influenzae. as well as the AIDA-I adhesin of E. coli. For the HMWl protein, this homology is greatest between residues 540 of the HAl protein and residues 1100 to about 1550 of HMWI. with homology in this overlap region. For the AIDA-1 protein, there is a roughly -12homology between the first 30 amino acids of AIDA-I and HAl, and the overall homology between the proteins is roughly 22%.

In addition, the HA1, HA2 and HA3 proteins of the present invention have homology to each other. as shown in Figures 4, 5 and 16. As between HA and HA2, the homology is 81% similarity and 72% identity overall. HA3 and HAl are 51% identical and 65% similar. Thus, for the purposes of the invention. HA1.

HA2 and HA3 are all HA proteins.

An "HAl" protein is defined by substantial homology to the sequence shown in Figure 2. This homology is preferably greater than about 60%. more preferably greater than about 70% and most preferably greater than 80%. In preferred embodimentsthe homology will be as high as about 90 to 95 or 98%. Similarly.

an "HA2" protein may be defined by the same substantial homology to the sequence "shown in Figure 3. and a "HA3" protein is defined with reference to Figure 15. as defined above.

In addition. for sequences which contain either more or fewer amino acids than the proteins shown in Figures 2. 3 and 15. it is understood that the percentage of homology will be determined based on the number of homologous amino acids in relation to the total number of amino acids. Thus. for example. homology of sequences shorter than that shown in Figures 2. 3 and 15. as discussed below. will be determined using the number of amino acids in the shorter sequence.

HA proteins of the present invention may be shorter than the amino acid sequences shown in Figures 2. 3 and 15. Thus. in a preferred embodiment, included within the definition of HA proteins are portions or fragments of the sequence shown in Figures 2.3 and 15. Generally, the HA protein fragments may range in size from about 7 amino acids to about 800 amino acids. with from about 15 to about 700 -13amino acids being preferred, and from about 100 to about 650 amino acids also preferred. Particularly preferred fragments are sequences unique to HA; these sequences have particular use in cloning HA proteins from other organisms, to generate antibodies specific to HA proteins, or for particular use as a vaccine.

Unique sequences are easily identified by those skilled in the art after examination of the HA protein sequence and comparison to other proteins; for example, by examination of the sequence alignment shown in Figures 5 and 16. Preferred unique sequences include the N-terminal region of the HA1, HA2 and HA3 sequences, comprising roughly 50 amino acids and the C-terminal 120 amino acids, depicted 10 in Figures 2, 3 and 15. HA protein fragments which are included within the definition of a HA protein include N- or C-terminal truncations and deletions which still allow the protein to be biologically active; for example. which still allow adherence, as described below. In addition. when the HA protein is to be used to generate antibodies, for example as a vaccine, the HA protein must share at least 15 one epitope or determinant with the sequences shown in Figures 2. 3 and 15. In a preferred embodiment, the epitope is unique to the HA protein; that is. antibodies o.

generated to a unique epitope exhibit little or no cross-reactivity with other proteins.

However, cross reactivity with other proteins does not preclude such epitopes or antibodies for immunogenic or diagnostic uses. By "epitope" or "determinant" 20 herein is meant a portion of a protein which will generate and/or bind an antibody.

Thus. in most instances, antibodies made to a smaller HA protein will be able to bind to the full length protein.

In some embodiments. the fragment of the HA protein used to generate antibodies are small; thus. they may be used as haptens and coupled to protein carriers to generate antibodies, as is known in the art.

In addition, sequences longer than those shown in Figures 2. 3 and 15 are also included within the definition of HA proteins.

-14- Preferably, the antibodies are generated to a portion of the HA protein which is exposed at the outer membrane, i.e. surface exposed. The amino-terminal portions of HAl. HA2 and HA3 are believed to be externally exposed proteins.

The HA proteins may also be identified as associated with bacterial adhesion. Thus.

deletions of the HA proteins from the naturally occuring microorganism such as Haemophilus species results in a decrease or absence of binding ability. In some embodiments.the expression of the HA proteins in a non-adherent bacteria such as E. coli results in the ability of the organism to bind to cells.

In the case of the nucleic acid. the overall homology of the nucleic acid sequence 10 is commensurate with amino acid homology but takes into account the degeneracy in the genetic code and codon bias of different organisms. Accordingly. the nucleic acid sequence homology may be either lower or higher than that of the protein sequence. Thus the homology of the nucleic acid sequence as compared to the nucleic acid sequencesof Figures 1.3 and 14 is preferably greater than about 15 more preferably greater than about 60% and most preferably greater than In some embodiments the homology will be as high as about 90 to 95 or 98%.

As outlined for the protein sequences. a preferred embodiment utilizes HA nucleic acids with substantial homology to the unique N-terminal and C-terminal regions of the HAl, HA2 and HA3 sequences.

In one embodiment.the nucleic acid homology is determinedthrough hybridizatimn studies. Thus. for example. nucleic acids which hybridize under high stringency to all or part of the nucleic acid sequences shown in Figures 1, 3 and 14 are considered HA protein genes. High stringency conditions include, but are not limited to. washes with 0.1XSSC at 65 0 C for 2 hours.

The HA proteins and nucleic acids of the present invention are preferably recombinant. As used herein. "nucleic acid" may refer to either DNA or RNA. or molecules which contain both deoxy- and ribonucleotides. The nucleic acids include genomic DNA, cDNA and oligonucleotidesincluding sense and anti-sense nucleic acids. Specifically included within the definition of nucleic acid are anti-sense nucleic acids. An anti-sense nucleic acid will hybridize to the corresponding noncoding strand of the nucleic acid sequences shown in Figures 1. 3 and 14, but may contain ribonucleotides as well as deoxyribonucleotides. Generally. anti-sense nucleic acids function to prevent expression of mRNA, such that a HA protein is S not made. or made at reduced levels. The nucleic acid may be double stranded.

single stranded. or contain portions of both double stranded or single stranded sequence. By the term "recombinant nucleic acid" herein is meant nucleic acid.

originally formed in vitro by the manipulation of nucleic acid by endonucleases.

in a form not normally found in nature. Thus an isolated HA protein gene. in a linear form. or an expression vector formed in vitr by ligating DNA molecules that are not normally joined, are both considered recombinant for the purposes of this invention: i.e. the HA nucleic acid is joined to other than the naturally occurring Haemophiluschromosome in which it is normally found. It is understood that once a recombinant nucleic acid is made and reintroduced into a host cell or organism.

it will replicate non-recombinantly.i.e. using the in vivo cellular machinery of the host cell rather than in vitro manipulations: however, such nucleic acids. once produced recombinantly.although subsequently replicated non-recombinantly.are still considered recombinant for the purposes of the invention.

Similarly.a "recombinant protein" is a protein made using recombinanttechniques.

i.e. through the expression of a recombinant nucleic acid as depicted above. A recombinant protein is distinguished from naturally occurring protein by at least one or more characteristics. For example, the protein may be isolated away from some or all of the proteins and compounds with which it is normally associated -16in its wild type host. or found in the absence of the host cells themselves. Thus.

the protein may be partially or substantially purified. The definition includes the productionof a HA protein from one organism in a different organism or host cell.

Altematively.the protein may be made at a significantly higher concentrationthan is normally seen, through the use of a inducible promoter or high expression promoter, such that the protein is made at increased concentration levels.

Alternatively.the protein may be in a form not normally found in nature. as in the addition of an epitope tag or amino acid substitutions, insertions and deletions.

Furthermore.althoughnotnormally considered"recombinant".proteins orportions of proteins which are synthesized chemically, using the sequence information of Figures 2.3 and 15. are considered recombinant herein as well.

Also included with the definition of HA protein are HA proteins from other organisms. which are cloned and expressed as outlined below.

In the case of anti-sense nucleic acids. an anti-sense nucleic acid is defined as one 15 which will hybridize to all or part of the corresponding non-coding sequence of the sequences shown in Figures 1. 3 and 14. Generally. the hybridization conditions used for the determination of anti-sense hybridization will be high stringency conditions. such as 0.1XSSC at Once the HA protein nucleic acid is identified. it can be cloned and. if necessary.

its constituent parts recombinedto form the entire HA protein nucleic acid. Once isolated from its natural source, contained within a plasmid or other vector or excised therefrom as a linear nucleic acid segment. the recombinant HA protein nucleic acid can be further used as a probe to identify and isolate other HA protein nucleic acids. It can also be used as a "precursor" nucleic acid to make modified or variant HA protein nucleic acids and proteins.

-17- Using the nucleic acids of the present inventionwhich encode HA protein, a variety of expression vectors are made. The expression vectors may be either selfreplicating extrachromosomal vectors or vectors which integrate into a host genome.

Generally, these expression vectors include transcriptional and translational regulatory nucleic acid operably linked to the nucleic acid encoding the HA protein.

"Operably linked" in this context means that the transcriptional and translational regulatory DNA is positioned relative to the coding sequence of the HA protein in such a manner that transcription is initiated. Generally. this will mean that the promoter and transcriptional initiation or start sequences are positioned 5' to the 10 HA protein coding region. The transcriptionaland translational regulatory nucleic acid will generally be appropriate to the host cell used to express the HA protein: for example, transcriptional and translational regulatory nucleic acid sequences from Bacillus will be used to express the HA protein in Bacillus. Numerous types of appropriate expression vectors, and suitable regulatory sequences are known in the art for a variety of host cells.

In general. the transcriptional and translational regulatory sequences may include.

but are not limited to, promoter sequences. leader or signal sequences. ribosomal binding sites. transcriptional start and stop sequences. translational start and stop sequences. and enhancer or activator sequences. In a preferred embodiment, the regulatory sequences include a promoter and transcriptional start and stop sequences.

Promoter sequences encode either constitutive or inducible promoters. The promoters may be either naturally occurring promoters or hybrid promoters. Hybrid promoters. which combine elements of more than one promoter, are also known in the art. and are useful in the present invention.

In addition. the expression vector may comprise additional elements. For example.

the expression vector may have two replication systems. thus allowing it to be -18maintained in two organisms. for example in mammalian or insect cells for expression and in a procaryotic host for cloning and amplification. Furthermore.

for integrating expression vectors, the expression vector contains at least one sequence homologous to the host cell genome. and preferably two homologous sequences which flank the expression construct. The integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector. Constructs for integrating vectors are well known in the art.

"In addition. in a preferred embodiment. the expression vector contains a selectable 10 marker gene to allow the selection of transformed host cells. Selection genes are well known in the art and will vary with the host cell used.

The HA proteins of the present invention are produced by culturing a host cell transformed with an expression vector containing nucleic acid encoding a HA protein, under the appropriate conditions to induce or cause expression of the HA 15 protein. The conditions appropriate for HA protein expression will vary with the choice of the expression vector and the host cell. and will be easily ascertained by one skilled in the art through routine experimentation. For example, the use of constitutivepromoters in the expression vector will require optimizing the growth and proliferation of the host cell. while the use of an inducible promoter requires the appropriate growth conditions for induction. In addition, in some embodiments.

the timing of the harvest is important. For example, the baculoviral systems used in insect cell expression are lytic viruses, and thus harvest time selection can be crucial for product yield.

Appropriate host cells include yeast. bacteria. archebacteria. fungi. and insect and animal cells. including mammalian cells. Of particular interest are Drosophila melangaster cells, Saccharomvces cerevisiae and other yeasts, E.coli, Bacillus -19bts. SF9 cells. C129 cells.293 cells,Neurospora.BHK, CHO. COS. and HeLa cells, immortalized mammalian myeloid and lymphoid cell lines.

In a preferred embodiment. HA proteins are expressed in bacterial systems.

Bacterial expression systems are well known in the art.

A suitable bacterial promoter is any nucleic acid sequence capable of binding bacterial RNA polymerase and initiating the downstream transcription of the coding sequence of HA protein into mRNA. A bacterial promoter has a transcnption initiation region which is usually placed proximal to the 5' end of the coding sequence. This transcription initiation region typically includes an RNA polymerase binding site and a transcription initiation site. Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter sequences derived from sugar metabolizing enzymes. such as galactose.

lactose and maltose. and sequences derived from biosynthetic enzymes such as tryptophan. Promoters from bacteriophage may also be used and are known in the art. In addition. synthetic promoters and hybrid promoters are also useful: for example, the tac promoter is a hybrid of the irp and lac promoter sequences.

Furthermore. a bacterial promoter can include naturally occurring promoters of nonbacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription.

In addition to a functioning promoter sequence. an efficient ribosome binding site is desirable. In E. coli. the ribosome binding site is called the Shine-Delgarno(SD) sequence and includes an initiation codon and a sequence 3-9 nucleotidesin length located 3 11 nucleotides upstream of the initiation codon.

The expression vector may also include a signal peptide sequence that provides for secretion of the HA protein in bacteria. The signal sequence typically encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. as is well known in the art. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria).

The bacterial expression vector may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed. Suitable selection genes include genes which render the bacteria resistant to drugs such as ampicillin.

chloramphenicol.erythromycin.kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes. such as those in the histidine. tryptophan 10 and leucine biosynthetic pathways.

*9o These components are assembled into expression vectors. Expression vectors for bacteria are well known in the art. and include vectors for Bacillus subtilis. E. coli.

Strepiococcus cremoris. and Strepococcus lividans. among others.

The bacterial expression vectors are transformed into bacterial host cells using techniques well known in the art. such as calcium chloride treatment.

electroporation. and others.

In one embodiment. HA proteins are produced in insect cells. Expression vectors for the transformation of insect cells, and in particular. baculovirus-based expression vectors, are well known in the art. Briefly, baculovirusis a very large DNA virus which produces its coat protein at very high levels. Due to the size of the baculoviral genome. exogenous genes must be placed in the viral genome by recombination.

Accordingly. the componentsof the expression system include: a transfer vector.

usually a bacterial plasmid. which contains both a fragment of the baculovirus genome. and a convenient restriction site for insertion of the HA protein; a wild type baculoviruswith a sequence homologousto the baculovirus-specificfragment -21in the transfer vector (this allows for the homologous recombination of the heterologous gene into the baculovirus genome); and appropriate insect host cells and growth media.

Mammalian expression systems are also known in the art and are used in one embodiment. A mammalian promoter is any DNA sequence capable of binding mammalian RNA polymerase and initiating the downstream transcription of a coding sequence for HA protein into mRNA. A promoter will have a transcription initiating region. which is usually place proximal to the 5' end of the coding sequence. and a TATA box. using a located 25-30 base pairs upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase SI11 to begin RNA synthesis at the correct site. A mammalian promoter will also contain an upstream promoter element, typically located within 100 to 200 base pairs upstream of the TATA box. An upstream promoter element determines the rate at which transcriptionis initiated and can act in either orientation. Of particular 15 use as mammalian promoters are the promoters from mammalian viral genes. since -the viral genes are often highly expressed and have a broad host range. Examples include the SV40 early promoter. mouse mammary tumor virus LTR promoter.

adenovirus major late promoter, and herpes simplex virus promoter.

Typically. transcription termination and polyadeny lation sequences recognized by mammalian cells are regulatory regions located 3' to the translation stop codon and thus. together with the promoter elements. flank the coding sequence. The 3' terminusofthe mature mRNA is formed by site-specificpost-translationalcleavage and polyadenylation. Examples of transcription terminator and polyadenlytion signals include those derived form The methods of introducing exogenous nucleic acid into mammalian hosts. as well as other hosts, is well known in the art. and will vary with the host cell used.

-22- Techniquesincludedextran-mediatedtransfection.calciumphosphateprecipitatio n polybrene mediated transfection, protoplast fusion. electroporation. encapsulation of the polynucleotide(s)in liposomes. and direct microinjection of the DNA into nuclei.

In a preferred embodiment.HA protein is produced in yeast cells. Yeast expression systems are well known in the art. and include expression vectors for Saccharomvces cerevisiae. Candida albicans and C. maltosa, Hansenula polymorpha, Kluvveromvces fragilis and K. lactis, Pichia guillerimondii and P. pastoris.

Schizosaccharomvces pombe, and Yarrowia lipolvtica. Preferred promoter 10 sequences for expression in yeast include the inducible GALl.10 promoter. the promoters from alcohol dehydrogenase.enolase, glucokinase, glucose-6-phosphae isomerase, glyceraldehyde-3-phosphate-dehydrogenase, hexokinase, phosphofructokinase. 3-phosphoglycerate mutase, pyruvate kinase, and the acid phosphatase gene. Yeast selectable markers include ADE2, HIS4. LEU2. TRP 1.

15 and ALG7. which confers resistance to tunicamycin; the G418 resistance gene.

0 which confers resistance to G418: and the CUP I gene. which allows yeast to grow in the presence of copper ions.

A recombinant HA protein may be expressed intracellularly or secreted. The HA protein may also be made as a fusion protein, using techniques well known in the art. Thus. for example. if the desired epitope is small. the HA protein may be fused to a carrier protein to form an immunogen. Alternatively.the HA protein may be made as a fusion protein to increase expression.

Also included within the definition of HA proteins of the present invention are amino acid sequence variants. These variants fall into one or more of three classes: substitutional, insertional or deletional variants. These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the HA -23protein, using cassette mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined above. However. variant HA protein fragments having up to about 100-150 residues may be prepared by in vitro synthesis using established techniques. Amino acid sequence variants are characterized by the predetermined nature of the variation, a feature that sets them apart from naturally occurring allelic or interspecies variation of the HA protein amino acid sequence.

The variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected which have modified 10 characteristics as will be more fully outlined below.

While the site or region for introducing an amino acid sequence variation is predetermined, the mutation per se need not be predetermined. For example. in order to optimize the performance of a mutation at a given site. random mutagenesis may be conducted at the target codon or region and the expressed HA protein variants screened for the optimal combination of desired activity. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. for example. M 13 primer mutagenesis. Screening of the mutants is done using assays of HA protein activities: for example. mutated HA genes are placed in HA deletion strains and tested for HA activity, as disclosed herein. The creation of deletion strains, given a gene sequence. is known in the art. For example. nucleic acid encoding the variants may be expressed in an adhesion deficient strain, and the adhesion and infectivity of the variant Haemophilus influenzae evaluated. For example, as outlined below, the variants may be expressed in the E. coli DH5a non-adherent strain, and the transformed E. coli strain evaluated for adherence using Chang conjunctival cells.

Amino acid substitutions are typically of single residues: insertions usually will be on the order of from about 1 to 20 amino acids. although considerably larger -24insertionsmay be tolerated. Deletions range from about I to 30 residues. although in some cases deletions may be much larger, as for example when one of the domains of the HA protein is deleted.

Substitutions, deletions, insertions or any combination thereof may be used to arrive at a final derivative. Generally these changes are done on a few amino acids to minimize the alteration of the molecule. However, larger changes may be tolerated in certain circumstances.

When small alterations in the characteristics of the HA protein are desired.

substitutions are generally made in accordance with the following chart: 'A Chart I a

V

Original Residue Exemplarv Substitutions Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Ser Thr Trp Tyr Val Ser Lys Gin. His Glu Ser Asn Asp Pro Asn. Gin Leu. Val Ile. Val Arg. Gin, Glu Leu. lie Met. Leu, Tyr Thr Ser Tyr Trp. Phe Ile. Leu Substantial changes in function or immunological identity are made by selecting substitutionsthat are less conservative than those shown in Chart I. For example.

substitutions may be made which more significantly affect: the structure of the polypeptide backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure; the charge or hydrophobicity of the molecule at the target site; or the bulk of the side chain. The substitutions which in general are expected to produce the greatest changes in the polypeptide's properties are those in which a hydrophilic residue, e.g. seryl or threonyl, is substituted for (or by) a hydrophobic residue. e.g. leucyl. isoleucyl, phenylalanyl, valyl or alanyl: a 10 cysteine or proline is substituted for (or by) any other residue; a residue having an electropositive side chain. e.g. lysyl. arginyl, or histidyl, is substituted for (or by) an electronegative residue, e.g. glutamyl or aspartyl: or a residue having a bulky side chain. e.g. phenylalanine. is substituted for (or by) one not having a side chain. e.g. glycine.

The variants typically exhibit the same qualitative biological activity and will elicit the same immune response as the naturally-occurringanalogue, although variants also are selected to modify the characteristics of the polypeptide as needed.

Alternatively.the variant may be designed such that the biological activity of the HA protein is altered. For example, the Walker box ATP-binding motif may be altered or eliminated.

In a preferred embodiment.the HA protein is purified or isolated after expression.

HA proteins may be isolated or purified in a variety of ways known to those skilled in the art depending on what other components are present in the sample. Standard purification methods include electrophoretic. molecular, immunological and chromatographic techniques, including ion exchange. hydrophobic, affinity, and reverse-phase HPLC chromatography. and chromatofocusing. For example, the HA protein may be purified using a standard anti-HA antibody column.

-26- Ultrafiltration and diafiltration techniques. in conjunction with protein concentration.

are also useful. For general guidance in suitable purification techniques. see Scopes.

Protein Purification, Springer-Verhg, NY (1982). The degree of purification necessary will vary depending on the use of the HA protein. In some instances no purification will be necessary.

Once expressed and purified if necessary. the HA proteins are useful in a number of applications.

For example, the HA proteins can be coupled. using standard technology, to affinity chromatography columns. These columns may then be used to purify antibodies from samples obtained from animals or patients exposed to the Haemophilus influenzae organism. The purified antibodies may then be used as outlined below.

Additionally,the HA proteinsare useful to make antibodiesto HA proteins. These antibodies find use in a number of applications. The antibodies are used to diagnose the presenceof an Haemophilus influenzae infection in a sample or patient. In a 15 preferred embodiment. the antibodies are used to detect the presence of nontypable Haemophilus influenza (NTHI). although typable H. influenzae infections are also detected using the antibodies.

This diagnosis will be done using techniques well known in the art: for example, samples such as blood or tissue samples may be obtained from a patient and tested for reactivity with the antibodies, for example using standard techniques such as ELISA. In a preferred embodiment. monoclonal antibodies are generated to the HA protein, using techniques well known in the art. As outlined above, the antibodies may be generated to the full length HA protein, or a portion of the HA protein.

-27- Antibodies generated to HA proteins may also be used in passive immunization treatments. as is known in the art.

Antibodies generated to unique sequences of HA proteins may also be used to screen expression libraries from other organisms to find, and subsequently clone, HA nucleic acids from other organisms.

In one embodiment, the antibodies may be directly or indirectly labelled. By "labelled" herein is meant a compound that has at least one element. isotope or chemical compound attached to enable the detection of the compound. In general labels fall into three classes: a) isotopic labels, which may be radioactive or heavy 10 isotopes; b) immune labels. which may be antibodies or antigens; and c) colored or fluorescent dyes. The labels may be incorporated into the compound at any position. Thus. for example, the HA protein antibody may be labelled for detection.

or a secondary antibody to the HA protein antibody may be created and labelled.

In one embodiment. the antibodies generated to the HA proteins of the present 15 inventionare used to purify or separate HA proteins or the Haemophilus influen-ae organism from a sample. Thus for example. antibodies generated to HA proteins which will bind to the Haemophilus influenzae organism may be coupled. using standard technology, to affinity chromatography columns. These columns can be used to pull out the Haemophilus organism from environmental or tissue samples.

In a preferred embodiment. the HA proteins of the present invention are used as vaccines for the prophylactic or therapeutic treatment of a Haemophilus influenzae infection in a patient. By "vaccine" or "immunogenic compositions" herein is meant an antigen or compound which elicits an immune response in an animal or patient The vaccine may be administered prophylactically. for example to a patient never previously exposed to the antigen. such that subsequent infection by the -28- Haemophilus influenzae organism is prevented. Alternatively, the vaccine may be administered therapeutically to a patient previously exposed or infected by the Haemophilus influenzae organism. While infection cannot be prevented, in this case an immune response is generated which allows the patient's immune system to more effectively combat the infection. Thus, for example, there may be a decrease or lessening of the symptoms associated with infection.

A "patient" for the purposes of the present invention includes both humans and other animals and organisms. Thus the methods are applicable to both human therapy and veterinary applications.

10 The administration of the HA protein as a vaccine is done in a variety of ways.

Generally, the HA proteins can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby therapeutically effective amounts of the HA protein are combined in admixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation are well known 15 in the art. Such compositions will contain an effective amount of the HA protein together with a suitable amount of vehicle in order to prepare pharmaceutically acceptable compositions for effective administrationto the host. The composition may include salts. buffers. carrier proteins such as serum albumin, targeting molecules to localize the HA protein at the appropriate site or tissue within the organism.and other molecules. The composition may include adjuvants as well.

In one embodiment, the vaccine is administered as a single dose; that is, one dose is adequate to induce a sufficient immune response to prophylactically or therapeuticallytreat a Haemophilusinfluen-ae infection. In alternate embodiments, the vaccine is administered as several doses over a period of time. as a primary vaccination and "booster" vaccinations.

-29- By "therapeutically effective amounts" herein is meant an amount of the HA protein which is sufficient to induce an immune response. This amount may be different depending on whether prophylactic or therapeutic treatment is desired. Generally, this ranges from about 0.001 mg to about 1 gm, with a preferred range of about 0.05 to about .5 gm. These amounts may be adjusted if adjuvants are used.

The following examples serve to more fully describe the manner of using the abovedescribed invention, as well as to set forth the best modes contemplated for carrying out various aspects of the invention. It is understood that these examples in no way serve to limit the true scope of this invention, but rather are presented for illustrative 10 purposes. All references cited herein are specifically incorporated by reference.

EXAMPLE 1 Cloning of HAl Many protocols are substantially the same as those outlined in St. Geme et al.. Mol.

Microbio. 15(1):77-85 (1995).

15 Bacterial strains, plasmids, and phages.

Nontypable H. influenzae strain 11 was the clinical isolate chosen as a prototypic HMW 1/HMW2-non-expressingstrain, although a variety of encapsulated typable strains can be used to clone the protein using the sequences of the figures. The organism was isolated in pure culture from the middle ear fluid of a child with acute otitis media. The strain was identified as H. influenzae by standard methods and was classified as nontypable by its failure to agglutinate with a panel of typing antiserafor H. influenzaetypes a to f (Burroughs Wellcome Co.. Research Triangle Park. and failure to show lines of precipitation with these antisera in counterimmunoelectrophoresis assays. Strain 11 adheres efficiently to Chang conjunctival cells in vitro, at levels comparable to those previously demonstrated for NTHI strains expressing HMWI/HMW2-like proteins (data not shown).

Convalescent serum from the child infected with this strain demonstrated an antibody response directed predominantly against surface-exposed high molecular weight proteins with molecular weights greater than 100 kDa.

M 13mpl 8 and Ml 3mpl 9 were obtained from New England BioLabs. Inc. (Beverly, Mass.) pT7-7 was the kind gift of Stanley Tabor. This vector contains the T7 RNA polymerase promoter 4) 10. a ribosome-binding site, and the translational start site for the T7 gene 10 protein upstream from a multiple cloning site.

10 Molecular cloning and plasmid subcloning.

The recombinant phage containing the HA41 gene was isolated and characterized using methods similar to those described previously. In brief, chromosomal

DNA

from strain 11 was prepared and Sau3A partial restriction digests of the DNA were prepared and fractionated on 0.7% agarose gels. Fractions containing DNA 15 fragments in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into XEMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack (Stratagene) and plate-amplified in a P2 lysogen of E. coli LE392.

Lambda plaque immunological screening was performedas described by Maniatis et al.. Molecular Cloning: A Laboratory Manual. 2d Ed. (1989). Cold Spring Harbor Press. For plasmid subcloning studies, DNA from recombinant phage was subcloned into the T7 expression plasmid pT7-7. Standard methods were used for manipulation of cloned DNA as described by Maniatis et al (supra).

Plasmid pHMW8-3 was generated by isolating an 11 kbp Xbal fragment from purified DNA from recombinant phage clone 11-17 and ligating into Xbal cut pT7-7.

Plasmid pHMW8-4 was generated by isolating a 10 kbp BamHI-Cial cut pT7-7.

-31- Plasmid pHMW8-5 was generated by digesting plasmid pHMW8-3 DNA with Clal, isolating the larger fragment and religating. Plasmid pHMW8-6 was generated by digesting pHMW8-4 with Spel. which cuts at a unique site within the HA1 gene.

blunt-endingthe resulting fragment, inserting a kanamycin resistance cassette into theSpel site. Plasmid pHMW8-7 was generatedby digestingpHMW8-3 with Nrul and Hindill, isolating the fragment containing pT7-7, blunt-ending and religating: The plasmid restriction maps are shown in Figure 6.

DNA sequence analysis.

DNA sequence analysis was performed by the dideoxy method with the U.S.

10 Biochemicals Sequenase kit as suggested by the manufacturer. ["S]dATP was purchased from New England Nuclear (Boston. Mass). Data were analyzed with Compugene software and the Genetics Computer Group program from the University of Wisconsin on a Digital VAX 8530 computer. Several 21-mer oligonucleotide primers were generated as necessary to complete the sequence.

Adherence assays.

Adherence assays were done with Chang epithelial cells [Wong-Kilbournme derivative, clone 1-5c-4 (human conjunctiva). ATCC CCL20.2)]. which were seeded into wells of 24-well tissue culture plates. as described (St. Geme IIl et al.. Infect.

Immun. 58:4036 (1990)). Bacteria were inoculated into broth and allowed to grow to a density of approximately 2 x 10' colony-forming units per ml. Approximately 2 x 10' colony-formingunits were inoculated onto epithelial cells monolayers. and plates were gently centrifuged at 165 x g for 5 min to facilitate contact between bacteriaand the epithelial surface. After incubation for 30 min at 370C in 5% CO,.

monolayerswere rinsed five times with phosphate buffered saline (PBS) to remove nonadherent organisms and were treated with trypsin-EDTA (0.05% -32- EDTA) in PBS to release them from the plastic support. Well contents were agitated, and dilution were plated on solid medium to yield the number of adherent bacteria per monolayer. Percent adherence was calculated by dividing the number of adherent colony-forming units per monolayer by the number of inoculated colonyforming units.

Isolation and characterization of recombinant phage expressing the strain I 1 high molecular weight adhesion protein.

The nontypable Haemophilus influenzae strain 11 chromosomal DNA library was screened immunologically with convalescent serum from the child infected with 10 strain 11. Immunoreactive clones were screened by Western blot for expression of high molecular weight proteins with apparent molecularweights 100 dDa and two different classes of recombinant clones were recovered. A single clone designated I 1-17 was recovered which expressed the HAl protein. The recombinant protein expressed by this clone had an apparent molecular weight of greater than 15 200 kDa.

Transformation into E. coli Plasmids were introduced into DH5a strain of E. coli (Maniatis, supra), which is a non-adherent strain, using electroporation (Dower et al., Nucl. Acids Res. 16:6127 (1988). The results are shown in Table 1.

-33- Table 1 Strain Adherence' 8-4) 43.3 8-5) 41.3 3.3% DH5a(pHMW 8-6) 0.6 ±0.3% 8-7) DH5a(pT7-7) 0.4 0.1% "Adherence was measured in a 30 minute assay and was calculated by dividing the S• number of adherent bacteria by the number of inoculated bacteria. Values are the 10 mean SEM of measurements made in triplicate from a representativeexperiment In addition.a monoclonal antibody made by standard procedures. directed against the strain 11 protein recognized proteins in 57 of 60 epidemiologically-unrelated NTHI. However. Southern analysis using the gene indicated that roughly only of the tested strains actually hybridized to the gene (data not shown).

EXAMPLE 2 Cloning of HA2 In a recent study we examined a series of H. influenza type b isolates by transmission electron microscopy and visualized short. thin surface fibrils distinct from pili (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). In that study. the large genetic locus involved in the expression of these appendages was isolated.

Bacterial strains and plasmids -34- H. influenzae strain C54 is a type b strain that has been described previously (Pichichero. P. Anderson, M. Loeb, and D.H. Smith. 1982. Do pili play a role in pathogenicity of Haemophilus influenzae type b? Lancet. ii:960-962.). Strain C54-Tn400.23 is a mutant that contains a mini-Tn 0 kan element in the hsf locus and demonstrates minimal in vitro adherence (St. Geme, J.W.III. and D. Cutter.

1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). Strains 1053.1058.1060,1063.1065.1069,1070.1076, 1081. and 1084 are H. influenzae type b isolates generously provided by J. Musser (Baylor University. Houston.

10 Texas) (Musser et al., 1990. Global genetic structure and molecular epidemiology of encapsulated Haemophilus influenzae. Rev. Infect. Dis. 12:75-111.).

H.

influenzae strains SM4 (type SM6 (type SM7 (type and SM72 (type c) are type strains obtained from R. Facklam at the Centers for Disease Control (Atlanta. Georgia). Strains 142.327. and 351 are H. influen:aetype e isolates.and .9 15 strains 134, 219, 256, and 501 are H. influenzae type f isolates obtained from H.

Kayhty (Finnish National Public Health Institute. Helsinki). Strain Rd (type d) and the 15 nontypable isolates examined by Southern analysis have been described previously (Alexander et al.. J. Exp. Med. 83:345-359 (1951): Barencamp et al..

Infect. Immun. 60:1302-1313(1992)). E. coli DH5a is a nonadherent laboratory 20 strain that was originally obtained from Gibco BRL. E. coli strain BL21(DE3) was a gift from F.W. Studier and contains a single copy of the T7 RNA polymerase gene under the control of the lac regulatory system (Studier. and B.A. Moffatt.

1986. Use of bacteriophage T7 RNA polymerase to direct high-level expression of cloned genes. J. Mol. Biol. 189:113-130.). Plasmid pT7-7 was provided by S.

Tabor and contains the T7 RNA polymerase promoter fl 0. a ribosome-bindingsite.

and the translational start site for the T7 gene 10 protein upstream from a multiple cloning site (Tabor. and C.C. Richardson. 1985. A bacteriophage T7 RNA polymerase/promotersystem for controlled exclusive expression of specific genes.

Proc. Natl. Acad. Sci. USA. 82:1074-1078.). pUC 19 is a high-copy-number plasmid that has been previously described (Yanish-Perronet al.. Gene 33:103-1 19(1985)).

pDC400 is a pUC 19 derivative that harbors the H. influenzae strain C54 surface fibril locus and is sufficient to promote in vitro adherence by laboratory strains of E. coli (St. Geme. J.W.IIl. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). pHMW8-5 is a pT7-7 derivative that contains the H. influenzae strain 11 hia locus and also promotes adherence by nonadherent laboratory strains ofE. coli (Barenkamp. and J.W. St. Geme. III. Identification of a second family of high molecular weight adhesion proteins expressed by 10 nontypable Haemophilus influenzae. Mol. Microbiol.. in press.). pHMW8-6 contains the H. influenzae hia locus interrupted by a kanamycin cassette (Barenkamp.S.J.. and J.W. St. Geme. III. Identificationof a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol.. in press.). pUC4K served as the source of the 15 kanamycin-resistancegene that was used as a probe in Southern analysis (Vieira.

and J. Messing. 1982. The pUC plasmids. an Ml3mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene.

19:259-268.).

Culture conditions H. influenzae strains were grown on chocolate agar supplemented with 1% Isovitale X. on brain heart infusion agar supplemented with hemin and NAD (BHI-DB agar).

or in brain heart infusion broth supplemented with hemin and NAD (BHIs) (Anderson. R.B. Johnston.Jr.. and D.H. Smith. 1972. Human serum activity against Haemophilus influenzae type b. J. Clin. Invest. 51:31-38.). These strains were stored at -80°C in brain heart infusion broth with 25% glycerol. E. coli strains were grown on Luria Bertani (LB) agar or in LB broth and were stored at in LB broth with 50% glycerol. For H. influenzac. kanamycin was used in a -36concentration of 25 mg/ml. Antibiotic concentrations for E. coli included the following: ampicillin or carbenicillin 100 mg/ml and kanamycin 50 mg/ml.

Induction of plasmid-encoded proteins To identify plasmid-encoded proteins, the bacteriophage T7 expression vector pT7-7 was employed and the relevant pT7-7 derivatives were transformed into E. coli BL21 (DE3). Activation of the T7 promoter was achieved by inducing expression of T7 RNA polymerase with isopropyl-b-D-thiogalactopyranoside (final concentration. 1 mM). After induction for 30 minutes at 37 0 C. rifampicin was added to a final concentration of 200 mg/ml. Thirty minutes later. I ml of culture was 10 pulsed with 50 mCi of trans-[("S]-label (ICN. Irvine. Calif.) for 5 minutes. Bacteria were harvested. and whole cell lysates were resuspended in Laemmli buffer for S.:analysis by sodium dodecyl sulfate-polyacrylamide gel electrophoresis on acrylamide gels (Laemmli. U.K. 1970. Cleavage-of structural proteins during the assembly of the head of bacteriophage T4. Nature (London). 227:680-685.).

Autoradiography was performed with Kodak XAR-5 film.

Recombinant DNA methods DNA lieations. restriction endonuclease digestions. and gel electrophoresis were S performed according to standard techniques (Sambrook. E.F. Fritsch. and T.

Maniatis. 1989. Molecular cloning: a laboratory manual. 2nd ed. Cold Spring Harbor Laboratory. Cold Spring Harbor. Plasmids were introduced into E. coli strains by either chemical transformation or electroporation.as described (Dower.

J.F. Miller. and C.W. Ragsdale. 1988. High efficiency transformation of E.

coli by high voltage electroporation.Nucleic Acids Res. 16:6127-614 Sambrook.

E.F. Fritsch. and T. Maniatis. 1989. Molecular cloning: a laboratory manual.

2nd ed. Cold Spring Harbor Laboratory. Cold Spring Harbor. Transformation in H. influenzae was performed using the MIV method of Herriott et al. (Herriott.

I

-37- E.M. Meyer, and M. Vogt. 1970. Defined nongrowth media for stage II competence in Haemophilus influenzae. J. Bacteriol. 101:517-524.).

Adherence assays Adherence assays were performed with tissue culture cells which were seeded into wells of 24-well tissue culture plates as previously described (St. Geme et al.. Infect.

Immun. 58:4036-4044(1991)). Adherence was measured after incubating bacteria with epithelial monolayers for 30 minutes as described (St. Geme, J.W.II1, S.

Falkow.and S.J. Barenkamp. 1993. High-molecular-weightproteins ofnontypable 0. Haemophilus influenzae mediate attachment to human epithelial cells. Proc. Natl.

10 Acad. Sci. U.S.A. 90:2875-2879.). Tissue culture cells included Chang epithelial cells (Wong-Kilboumederivative.clone 1-5c-4 (human conjunctiva))(ATCC

CCL

20.2). KB cells (human oral epidermoid carcinoma) (ATCC CCL 17). HEp-2 cells (human laryngeal epidermoid carcinoma) (ATCC CCL 23). A549 cells (human lung carcinoma) (ATCC CCL 185). Intestine 407 cells (human embryonic intestine) (ATCC CCL HeLa cells (human cervical epitheloid carcinoma) (ATCC CCL ME-I 180 cells (human cervical epidermoid carcinoma) (ATCC HTB 33). HEC-IB cells (human endometrium) (ATCC HTB 113). and CHO-KI cells (Chinese hamster ovary) (ATCC CCL 61). Chang. KB. Intestine 407. HeLa. and HEC-IB cells were maintained in modified Eagle medium with Earle's salts and non-essential amino acids. HEp-2 cells were maintained in Dulbecco'smodified Eagle medium. A549 cells and CHO-KI cells in F12 medium (Ham). and ME-180 cells in medium. All media were supplemented with 10% heat-inactivated fetal bovine serum.

Southern analysis Southern blotting was performed using high stringency conditions as previously described (St. Geme. J.W.II1. and S. Falkow. 1991. Loss of capsule expression by -38- Haemophilus influenzae type b results in enhanced adherence to and invasion of human cells. Infect. Immun. 59:1325-1333.).

Microscopy Samples of epithelial cells with associated bacteria were stained with Giemsa stain and examined by light microscopy as described (St. Geme. J.W.III.and S. Falkow, S. 1990. Haemophilus influenzae adheres to and enters cultured human epithelial cells. Infect. Immun. 58:4036-4044.).

9, :For negative-staining electron microscopy. bacteria were stained with 0.5% aqueous uranyl acetate (St. Geme. J.W.III. and S. Falkow. 1991. Loss of capsule expression ago" 10 by Haemophilus influenzae type b results in enhanced adherence to and invasion of human cells. Infect. Immun. 59:1325-1333.) and examined using a Zeiss microscope.

The previous study indicated that laboratory E. coli strains harboring the plasmid pDC400 were capable of efficient attachment to cultured human epithelial cells (St. Geme. J.W.IIl. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells.

Mol. Microbiol. 15:77-85.). Subcloning studies and transposon mutagenesis indicated that the relevant coding region of pDC400 was present within an 8.3 kb Xbal fragment (St. Geme. J.W.II. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.) (Figure To confirm this conclusion, in the present study this XbaI fragment was subcloned into pT7- 7 generating plasmids designated pDC601 and pDC602. which contained the insert in opposite orientations (Figure As predicted. expression of these plasmids in E. coli DH5a was associated with a capacity for high level in vitro attachment (Table 1).

-39- Table 1. Adherence to Chang conjunctival cells.

Strain ADHERENCE inoculum)' DH5m/pT7-7 0.4 0.1 DH5a/pDC400 25.3 1.2 DH5c/pDC601 54.3 DH5a/pDC602 55.5 4.3 C54b-p- 98.7 C54-HAl::kanb 1.5 0.2 C54-Tn400.23 c 3.3 0.4 aAdherence was measured in a 30 minute assay and was calculated by dividing the number of adherent bacteria by the number of inoculated bacteria. Values are the mean SEM of measurements made in triplicate from representative experiments bStrain C54-HAl::kan was constructed by transforming C54b-p- with linearized pHMW8-6. which contains the HA 1 gene with an intragenic kanamycin cassette.

cStrain C54-Tn400.23 contains a mini-Tn/ 0 kan element in the hsflocus (St. Geme et al.. Mol. Microbiol. 15:77-85 (1995)).

To determine the direction of transcription and identify plasmid-encoded proteins.

pDC601 and pDC602 were subsequently introduced into E. coli BL21(DE3).

producing BL21(DE3)/pDC601 and BL21(DE3)/pDC602, respectively. As a negative control, pT7-7 was also transformed into BL21 (DE3). The T7 promoter in these three strains was induced with IPTG. and induced proteins were detected using trans-[PS]-label. As shown in Figure 8. induction of BL21 (DE3)/pDC601 resulted in expression of a large protein over 200 kDa in size along with several slightly smaller proteins, which presumably represent degradation products. In contrast. when BL21(DE3)/pDC602 and BL21(DE3)/pT7-7 were induced, there I was no expression of these proteins. This experiment indicated that the genetic material contained in the 8.3 kb Xbal fragment is transcribed from left to right as shown in Figure 7 and suggested that a single long open reading frame may be present.

Nucleotide sequencing Nucleotide sequence was determined using a Sequenase kit and double-stranded plasmid template. DNA fragments were subcloned into pUC 19 and sequenced along both strands by primer walking. DNA sequence analysis was performed using the Genetics Computer Group (GCG) software package from the University of .10 Wisconsin (Devereux. P. Haeberli. and 0. Smithies. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12:387-395.).

Sequence similarity searches were carried out using the BLAST program of the National Center for Biotechnology Information (Altschul, W. Gish. W. Miller, E.W. Myers. and D.J. Lipman. 1990. Basis local alignment search tool. J. Mol. Biol.

215:403-410.).

Sequencing of the 8.3 kb Xbal fragment revealed a 7059 bp gene. which is designated for literature purposes as hsf for Hae6mophilus surface fibrils. and is referred to herein as HA2. This gene encodes a 2353-amino acid polypeptide.

referred to as Hsf or HA2. with a calculated molecular mass of 243.8 kDa. which is similar in size to the observed protein species detected after induction of BL21 (DE3)/pDC601. The HA2 gene has a GC content of 42.8%. somewhat greater than the published estimate of 38-39% for the whole genome (Fleischmann et al..

1995. Whole-genomerandom sequencingand assemblyofHaemophilusinfluenzae Rd. Science. 269: 496-512.. Kilian. M. 1976. A taxonomic study of the genus Haemophilus. with proposal of a new species. J. Gen. Microbiol. 93:9-62.).

A

putative ribosomal binding site with the sequence AAGGTA begins 13 base pairs upstream of the presumed initiation codon. A sequence similar to a rho-independent -41transcriptionterminatoris present beginning 20 nucleotidesbeyond the stop codon and contains interrupted inverted repeats with the potential for forming a hairpin structure containing a loop of two bases and a stem of 11 bases. Of note, a string of 29 thymines spans the region from 149 to 121 nucleotides upstream of HA2.

Homology to HAI/HAl The nontypable H. influenzae nonpilus protein HAl protein (called Hia in the literature) promotes attachment to cultured human epithelial cells as outlined above.

Comparison of the predicted amino acid sequence of HA2 and the sequence of HAl .revealed 81% similarity and 72% identity overall. As depicted in Figure 5. the two 10 sequences are highly conserved at their N-terminal and C-terminal ends. and both contain a Walker box nucleotide-bindingmotif. Interestingly. HA is encoded by a 3.2 kb gene and is only 115-kDa. In this context. it is noteworthy that three separate stretches of HA2 (corresponding to amino acids 174 to 608. 847 to 1291.

and 1476 to 1914, respectively) show significant homology to the region of HAl defined by amino acids 221 to 658 (Figure Table 2 summarizes the level of similarity and identity between these three stretches of HA2 and one another. The suggestion is that the larger size of HA2 may relate in part to the presence of a repeated domain which is present in single copy in HAl.

Table 2. Percent similarity and percent identity between HA2 repeats.

Percent Similarity/Percent Identity HA2 174-6083 HA2 847-1291* HA21476-1914 3 HA2 174-608 65/53 76/60 70/56 HA2 847-1291 HA2 1476-1914 'Numbers correspond to amino acid residue positions in the full-length HA2 (Hsf) protein.

-42- To evaluate whether HA land HA2 are alleles of the same locus, a series of Southern blots were performed. Samples of chromosomal DNA from strains C54 and II were subjected to digestion with Bgll. Clal and either Pstl or Xbal. Resulting

DNA

fragments were separated by agarose electrophoresis and transferred bidirectional' to nitrocellulose membranes. One membrane was probed with a 3.3 kb internal fragment of the HA2 gene (Figure and the other membrane was probed with a 1.6 kb intragenic fragment of the HAl gene. As shown in Figure 9. both probes recognized exactly the same chromosomal fragments.

To obtain additional evidence that the HA.42 and HAl genes are homologs. the inactivation of HA2 by transformation of H. influenzae strain C54b-p with insertionally inactivated HA was attempted. The plasmid pHMW8-6 (Barenkampl and J.W. St. Geme. 111. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol..

in press.). which contains the HA l gene with an intragenic kanamycin cassette. was 5 linearized with Ndel and introduced into competent C54. Southern hybridization confirmed insertion of the kanamycin cassette into HA42 (not shown). Furthermore examination of the C54 mutant by negative staining transmission electron microscopy revealed the loss of surface fibrils (not shown). Consistent with these findings. the mutant strain demonstrated minimal attachment to Chang conjunctival cells (Table 1).

In additional experiments, the cellular binding specificities conferred by the HA2 and HAl proteins were compared. As shown in Figure 10. DH5c/pDC601 (expressing HA2) demonstrated high level attachment to Chang cells. KB cells. HeLa cells. and Intestine 407 cells. moderate level attachrnentto HEp-2 cells. and minimal attachment to HEC-IB cells. ME-180 cells. and CHO-Kl cells. DH5a harboring (expressing HA41) showed virtually the same pattern of attachment.

-43- Giemsa staining and subsequent examination by light microscopy confirmed these viable count adherence assay results.

Homology to other bacterial extracellular proteins A protein sequence similarity search was performed with the HA2 predicted amino acid sequence using the BLAST network service of the National Center for Biotechnology Information (Altschul, W. Gish, W. Miller. E.W. Myers, and D.J. Lipman. 1990. Basis local alignment search tool. J. Mol. Biol. 215:403-410.).

This search revealed low-level sequence similarity to a series of other bacterial adherence factors, including HMW1 and HMW2 (the proteins previously identified 10 as being important adhesins in HAl-deficientnontypableH. influenzae strains: (St.

Geme. J.W.Ill. S. Falkow. and S.J. Barenkamp. 1993. High-molecular-weight proteins ofnontypableHaemophilusinfluenzae mediate attachment to human epithelial cells.

Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879.). AIDA-1 (an adhesion protein expressed by some diarrheagenic E. coli strains: Benz. and M.A. Schmidt. 1992. AIDA-1.

the adhesin involved in diffuse adherence of the diarrhoeagenic Escherichia coli strain 2787 (0126:H27). is synthesized via a precursor molecule. Mol. Microbiol.

6:1539-1546.).and Tsh (a hemagglutinin produced by an avian pathogenic E. coli strain: Provence.D. and R. Curtiss III. 1994. Isolation and characterizationof a gene involved in hemagglutinationby an avian pathogenic Escherichia coli strain. Infect Immun.62:1 3 6 9 -1 3 8 In addition. HA2 showed homology to SepA. a Shigella flexneri secreted protein that appears to play a role in tissue invasion (Benjelloun-Touimi. P.J. Sansonetti. and C. Parsot. 1995. SepA. the major extracellular protein of Shigellaf.lexneri: autonomous secretion and involvement in tissue invasion. Mol. Microbiol. 17:123-135.). Alignment of HA2 with HMW1.

HMW2. AIDA-I. Tsh. and SepA revealed a highly conserved N-terminal domain (Figure 11). In AIDA-1. Tsh. and SepA. this N-terminal extremity precedes a typical procaryotic signal sequence (Benjelloun-Touimi.Z.. P.J. Sansonetti. and C. Parsot.

1995. SepA. the major extracellular protein ofShigellaflexneri: autonomous secretion -44and involvement in tissue invasion. Mol. Microbiol. 17:123-135.). Similarly. in HA2 this conserved domain precedes a 26 amino acid segment that is characterized by a positively charged region, followed by a string of hydrophobic residues. and then alanine-glutamine-alanine.

Presence of an HA2 homolog in other encapsulated and nonencapsulated strains Previous work demonstrated that an HA2 homolog is present in H. influenzae type b strains M42 and Eagan (St. Geme, J.W.II1. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). To define the extent to which the H42 locus is shared by other type b strains. a panel of evolutionarilydiverse type b isolates by Southern analysis were examined. Among these strains were six belonging to phylogenic division I and four belonging to phylogenic division II (Musser. J.S. Kroll. E.R. Moxon. and R.K. Selander. 1988. Evolutionary genetics of the encapsulated strains of Haemophilus influenzae. Proc. Natl. Acad.

Sci. U.S.A. 85:7758-7762.). Chromosomal DNA was digested with BglIH and then probed with the intragenic 3.3 kb fragment of the -1.42 gene. As shown in Figure 12. all 10 strains showed hybridization. The universal presence among H. influenzae ;type b raised the question of the prevalence of this locus in other non-type b encapsulated H. influenzae. Southern analysis of a series of type a. c. d. e. and f isolates again demonstrated a homolog in all cases (Figure 13).

Recently Fleischmannet al. (FleischmannR.D..et al.. 1995. Whole-genomerandom sequencing and assembly of Haemophilus influenzae Rd. Science. 269: 496-512.) reported the genome sequence of H. influenzae strain Rd. which was one of the two serotype d strains examined by Southern analysis. In accord with the Southern blotting results. search of the Rd genome revealed an open reading frame with striking sequence similarity to HA2. The Rd gene is 894 nucleotides in length and is predicted to encode a protein of 298 amino acids. Overall. the Rd locus is 70% identical to the C54 HA2 gene. and the Rd derived amino acid sequence is 62% identical and similar to C54 HA2. Interestingly, the Rd open reading frame appears to be truncated due to a "premature" stop codon.

Previous experiments revealed that 13 of 15 nontypable strains lacking an HMW 1/HMW2-relatedprotein had evidence of an HA homolog (Barenkamp,

S.J.,

and J.W. St. Geme. III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., in press.). Consistent with the demonstration that HA2 and HAl are homologous.

Southern analysis of these 15 strains, probing with the 3.3 kb fragment of hsf.

10 demonstrated hybridization in 12 of the same 13 (not shown).

Chromosomal location of the HA2 locus In earlier work. the HAl locus in nontypable strain 11 was found to be flanked upstream by an open reading frame with significant homology to E. coli exoribonuclease II (Barenkamp. and J.W. St. Geme. I11. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzace. Mol. Microbiol.. in press.). Similarly. the HA2 locus in strain C54 likewise is flanked on the 5' side by an open reading frame with similarity to E. coli exonucleasellI. This gene terminates 35 7 base pairs before the HA2 start codon and encodes a protein with a predicted amino acid sequence that is 61% similar and 33% identical at its C-terminal end to exoribonucleasell. Of note, the Rd HA2 homolog is also flanked upstream by the exoribonuclease 11 locus.

EXAMPLE 3 Cloning of HA3 Recombinant phage containing the nontypable Haemophilus strain 32 HA3 gene were isolated and characterized using methods modified slightly from those described -46previously (Barenkamp and St. Geme. Molecular Microbiology 1996, in press). In brief. chromosomal DNA from strain 32 was prepared by a modification of the method of Marmur (Marmur. 1961). Sau3A partial restriction digests of the DNA were prepared fractionated on 0.7% agarose gels. Fractions containing DNA fragments in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into XEMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack® (Stratagene, La Jolla. CA) and plate amplified in a P2 lysogen ofE. coli LE392.

Lambda plaque screening was performed using a mixture of three PCR products derived from strain 32 chromosomal DNA. These PCR products were amplified using 10 primer pairs previously shown to amplify DNA segments at the 5' end of the strain S11 HAl gene. The primers were as follows: Primer desination strandsequence 44P positive CCG TGC TTG CCC AAC ACG CTT 64P positive GCT GCC ACC TTG CAC AAC AAC 93G-2 positive CTT TCA ATG CCA GAA AGT AGG 1i8T- i negative CTT CAA CCG TTG CGG ACA ACA Each of the positive strand primers was used with the single negative strand primer to generate the three fragments used for probing the library.

The PCR products generated from strain 11 and strain 32 chromosomal DNA were identical in size. suggesing that the nucleotide sequences of these chromosomal regions were similar in the two strains. Plaque screening was performed using standard methodology (Berger and Kimmel. 1987) at high stringency: final wash conditions were 65C for 1 hour in buffer containing 2XSSC and 1% SDS. Positive plaques were identified by autoradiography. plaque purified and phage DNA was purified by standard methods. The same primer pairs used to generate the screening P:\OPERUEH\4716240 spe.doc-29/08/02 47 probes were then used to localize the HA3 gene by amplifying various restriction fragments derived from the phage DNA. Once localized, the strain 32 HA3 gene and flanking DNA were sequenced using standard methods.

In order to construct strain 32 isogenic Haemophilus influenzae mutants deficient in expression of the HA3 gene, bacteria were made competent using the MIV (Herriott et al.

1970) and were transformed with linearized pHMW8-6, selecting for kanamycin resistance. Allelic exchange was confirmed by Southern analysis. The mutants that no longer expressed HA3 exhibited a marked decrease in binding to Chang epithelial cells, using the methods outlined above (data not shown).

Expression in non-adherent strains of E. coli did not result in adherence, although it has not been confirmed that the protein was actually expressed.

Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will i be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.

The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that that prior art forms part of the common general knowledge in Australia.

SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Washington University (ii) TITLE OF INVENTION: HAEMOPHILUS ADHESION PROTEINS (iii) NUMBER OF SEQUENCES: 19 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Flehr, Hohbach, Test, Albritton Herbert STREET: Four Embarcadero Center, Suite 3400 CITY: San Francisco STATE: California COUNTRY: United States ZIP: 94111-4187 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: UNKNOWN FILING DATE: 22-MAR-1996

CLASSIFICATION:

(vii) PRIOR APPLICATION DATA: APPLICATION NUMBER: US 08/409,995 FILING DATE: 24-MAR-1995 (viii) ATTORNEY/AGENT INFORMATION: NAME: Silva, Robin M.

REGISTRATION NUMBER: 38,304 REFERENCE/DOCKET NUMBER: FP61053-1/RFT/RMS (ix) TELECOMMUNICATION

INFORMATION:

TELEPHONE: (415) 781-1989 TELEFAX: (415) 398-3249 TELEX: 910 277299 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 3294 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 49

GTTGTGACTC

ATGAACAAAA TTTTTAACGT

TATTTGGAAT

AAACTTGGGT TGTCGTATCT

GAACTCACTC

ACCCTGTTGT

GCTTATGGCG

GCACCCACAC

CCGCAACGGT

ATGCGAATTT

GTTCAAGAGG CTTATAAAGG

TTGGTGGAGG

TCTAGCAAAA

TTGTTTGAAG

ATTACCTTTG

ACGPATTGGCG

ACAACTGATG

CACTTGAATG

ATTGACGGAG

AATGCGGGTT

GTCGATTTTG

ACTGTTACTG

ACTTCTGTTA

AATAAPAGTTG

GCGA.AAGATC-

AATGGTCAA;

GGTA.ATGGTW

GCGAAAGTTC

CTTACTGTG

TCAACTGAC(

AGCTGGACT.

ACAALTACTGC

ACGGCACAAG

GCAAAGGCGG

CTTTAGCGAPA

GTGGTGCTGC

GCTTGAAGTT

GTATTGGTTC

GAGATCAAAG

GGAATATCAA

TTCATACTTP

TAGATAGCAJP

.TCAAAGAAA;

ATGGTGCTA;

TGATTGACG(

ATGGCGACT*

k. CAACTGCGA' 3GCGACGGCT k. ATGATGGTA 3 AGAAGAAAT h. CAACTGCTC cAAATGCGCC

TGAGGCGAAC

TAATTTCACT

TTTATTAAAT

GGCGACCGTA

GAACGAGAAA

TGTGCAGGTT

AGACCTTGGT

TGCAGGTGCT

CGCTAAAGAT

AACCTTGACA

TACGCATTAC

GGGTGTTA-

CGATACTGTI

AGAAAACGGI

AACAATACTC

AATAATTCGA

CTAAATGA;A

GGCAATTTGC

AGCCAACAAG

ACTTCCACCT

GTGA.AAACTG

ACAACAACAC

GCTGCGGGTG

GACACGCTTG

ACTCGTGCAG

LGCTGGCTCAP

GAGTTCTTG;

AAGAGAACCC

TCCGCCACCG TGGCGGTTGC

CGTATTGGCA

CTGTTACGAA

TAAGTTGAAG

TAGCAGATGC

AAAATGCGAG

GTAAATTGGG

TCAAACATGC

CTGAAAACGG

CGACTGTGAG

CGAAAGTGAA

CTAATGGCGA

TGGGTTCTCC

CAAGTATCAA

CAACTGGTCA

SGTGCGGATAC

;AAGTTAAAAT

AGAAAAACAA

TGATAXACTG

CTGGGTATTG

GGATGAAGTG

CAAACACACC

TGATACCTTA

TGTAACTAGT

TACTACGGTT

TGCTACTCAT

GGATGTCTTG

ATCAGAAAAT

AGAGACCACG

CGGTG CGAAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 16B0 AGACGGTAAG TTATTTACTG GAAAAGCTAA

CAAAGAGACA

CGCGACTGAA

AGTGAATAAG

CGCAACTG-T

TGTAACTAAT

AAAACTAGAT

GAACGCTAAT

GGTTACAGCA

TGAGGCGGAC

ATGCAGACG

A.CTGGTTGGA

GCATCAGGCA

GGCACCGATG

GGCGATAA.AA

A.ATCCGAAAG

AAAGGTTTAG

GGTGGTACGC

TTTAAAGCAG

CAAGATGCTI

ACTGAA.ATC;

AAGGCAAAGG C

GAATTAAAAC

CAAATGTAAC

GTATTACCGT

TCGCTGCAGA

GTAAAGTGGC

TAACAGCCTT

TTGATGGAA

GCAAGAACTT

TAACAGGCTT

ACAAAGACGG

TTAGTGACT

LACCGATGCT

ZTTTGCTAGT

rAAGTATGAT

TACGACCGCA

TGATGTTGCT

AAACAGTCTA

TGCAAGTGAG

AAAAGTGAA.A

AACGAGCATT

CTTAACCATC

CAAGA.AGTTA AAGCGGGCGA TAA.AGTAACC CAAGAGGGTG

CGAACTTTAC

ACTTTAGGTA

CAGGAAATAA

TTATTCACTG-

TGGTGCGAAA

ACACCAGCAA

ATTAGTGCGG

GCGAATTTCG

TATAAAGGCT

GACAATACCG

-AAAACCACAG

TTCAAAAGCG

ACTTTTGAAT

AATGGAAAGG

ATGGTGCGGG

GCGGTCAGTC

TGCAA.ATAAT

GGTTAAAAAC

GCAAACACCA TCAGCGTAAC GTTGTGAGCG

GACTGAAGAA

CAAAGACGGC

ATTTGGTGAT

1740 1800

ATCCGCTGAC

TGACCAATTT

CCGCAACCGT

GCGGCTCAAC

GCAACGGTAT

TGGCTAAAGG

AA.ACGAGCCT

TAGCTCCGCC GACAACTTAA' CGAAACAAAA

TGACGATGCC

GGATGAAAAA

GGGCGATTTG

GGAATATCAC

CAATGTTTCC

TGAAGTGGTT

GGTTAAAGTT

GGTACAGACA

CGCGGCT'rGG GATcAAGTTC

GGTAAAACGG

AAATCGAATG

GGCGATAAAT

AGCAAACTCC

GCTGGGTCAT

GGAATGCGAA

TCAACGGTAG

AATTTACCGT

ATTACAGCAA

CAGTTGCTGC

AAGCTACCAT

AGTTGTTGCC

TTCTGCGGAC

CGAAGTGAAA

GCGTGA.AATT

CAAAGAAACC

AGAGGATATT

GAXATATCAA

AACCAACA.AA

S.

S S 5*55 GACTTAACAA CAGGTCAGCC TAAATTAAAA

GATGGCAATA

GATAAAGGTG GCAAAGTCGT TTCTGTAACG GATAAfrACTG

S

*SSS

GGTTCTGGCT

CTTGGCTITGG

TCTGCTGGTA

AATACCAAAG

ACAACCTTTG

AACGGTAAGA

GCTGACGGTA

AAGAAAGTTG

GATAAAACCA

AGCCTTGATC

GGCGATATT",

AAAGGGGTAJ

GGCAAACGT(

ACTATGCCA

TTAGCTATC

ACAACCAA'I

ATGTAACAGG TAACCAAGTG GCAGATGCGA

T

CTGATGAAGC TGATGCGAAA CGGGCGTTTG

AV

CAACGGAAAT TGTAAATGCC CACGATAAAG

T

TGAGCGCGGC AACGGTGGAA AGCACCGATG

C

TGAAAACCGA TGTGGAATTG

CCTTTAACGC

AAATCACTAA AGTTGTCAAA GATGGGCAAA

C

CGGCTGATAT GACCAAAGAA GTTACCCTCG

C

TGAAAGACAA CGATGGCAAG

TGGTATCACG

ALAGGCGAAGT GAG CAATGAT

AAAGTTTCTA

CAAATGATCA ATCAAAAGGT

AAAGGTGTCG

rCTGCCACTTC CACCGATGCG

ATTAACGGAA

~CAALACCTTGC TGGACAAGTG

AATAATCTTG

G CAGATGCAGG TACAGCAAGT

GCATTAGCGG

G GTAA.ATCAAT GGTTGCTATT

GCGGGA.AGTA

G GGGTATCAAG AATTTCCGAT

AATGGCAAAG

A GTCAAGGTA6A AACAGGCGTT

GCAGCAGGTG

rGCGAAATC AGGCTTTGAG rGATAAGAC

AAAAGCCTTA

CCGTTTTGC

TAATGGTTTA

AAACGGCGA

TAAAGTGACC

AATCTACAA

TACCGATGCA

:TAAATGGTA

TGAACTGAAT

;TAACGTGGA

TTCAGACGGC

ZCAAAGCTGA

CGGTACTGCG

CCGATGAAAA

ACACGTTGTC

TGATTGAcAA

TGTGGCTAAT

GTCAGT'rGTA

TGCTGTGGCA

AGGGCAAAGT

GAATAAAGTG

CTTCACAGTT

ACCACAAGCC

GTTATCAAGG

TCAA.AATGGT

TGATTATTCG

CTTGTCAGGC

TTGGTTACCA

GTGG

1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2560 2640 2'700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3294 INFORMATION FOR SEQ ID NO:2: SEQUENCE

CHARACTERISTICS:

LENGTH: 1098 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Asn Lys Ile Phe Asn Val Ile Trp Asn Va 1 5 10 Val Val Val Ser Glu Leu Thr Arg Thr His Th 20 25 Thr Val Ala Val Ala Val Leu Ala Thr Leu Le 40 Ala Asn Asn Asn Thr Pro Val Thr Asn Lys L 55 Ala Asn Phe Asn Phe Thr Asn Asn Ser Ile A 65 70 7 Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn L 90 Ser Asp Lys Leu Leu Val Glu Asp Asn Thr A 100 105 Leu Arg Lys Leu Gly Trp Val Leu Ser Ser L 115 120 Glu Lys Ser Gln Gin Val Lys His Ala Asp C 130 135 Lys Gly Gly Val Gin Val Thr Ser Thr Ser 145 150 Ile Thr Phe Ala Leu Ala Lys Asp Leu Gly 165 170 Ser Asp Thr Leu Thr Ile Gly Gly Gly Ala 180 185 Thr Pro Lys Val Asn Val Thr Ser Thr Thr 195 200 Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr 210 215 Ile Gly Ser Thr Leu Thr Asp Thr Leu Val 225 230 1 ir u eu la 5 eu la ,ys Gll 3l 15 Va Al As

T

G

2 Val Thr Lys Cys Ser Ala Lys Ala Asp Ala Asn Glu Ala Thr Asn Gly 12 u Val Lei 140 u Asn Gl 1 Lys Th a Ala Gl ;p Gly Le 2 ir Val H 220 ly Ser P Gln Thr Tr Ala Ser Al Thr Val Gl Tyr Gly As Glu Lys G] 8( Lys Asn A: Val Gly A 110 t Thr Arg A u Phe Glu G y Lys His I r Ala Thr 175 .y Ala Thr 190 iu Lys Phe is Leu Asn ro Ala Thr p a u p 1n la sn sn ly 'hr Val Thr Ala Gly His 240 _v *aa.

a a. Ile Lys Ser Thr Asp 305 Thr Asn Asp Asn

GI

38~ G1 Va

L%

Al

L

4

S

A

Asp Asp Thr Val 290 Ser Ser Lys Gl Ly 37' y As 5 y As Ly 'S 1 a A 4 vs L 65 er T snA la er Gly Gly Val Leu 260 Thr G13 275 Glu Ph Lys GlI Val I1 Glu Th 34 u Gly Ly 355 s Thr GI 0 p Phe A n Gly T s Tyr A 4 e Ala A 435 sn Asn 50 ys Leu rp Thr la Ser Gly Lys 515 Leu Gin Asp 245 Asn Gir e Let u As e Ly 32 r As 0 s G1 y Ti la T hr T 4 sp A 20 la P Pro Val Thr Glu 500 Asn Asp

"I

1 5 r h h 0

T

4 Gin Ser Thr Ala Gly Trp Ser Glu Asn 280 Ser Ala Asp 295 Gly Lys Ar 310 Glu Lys As Lys Val As y Leu Val Th 36 p Arg Ile L) 375 r Val Ala S 390 r Ala Thr V 5 a Lys Val G sp Thr Thr P 4 ys Gly Lys 455 hr Ala Lys 470 hr Ala Ala 85 ;1n Glu Val Leu Lys Val Ala Leu Thr 535 52 His Tyr Thr Arg Ala 250 Asn Ile Lys Gly Val 265 Val Asp Phe Val His 285 Thr Glu Thr Thr Thr 300 g Thr Glu Val Lys Ile 315 p Gly Lys Leu Phe Th 330 p Gly Ala Asn Ala Th 345 r Ala Lys Asp Val Il 0 36 s Thr Thr Asp Ala As 380 er Gly Thr Asn Val Th 395 al Thr Asn Gly Thr A 410 ly Asp Gly Leu Lys L 425 la Leu Thr Val Asn A 40 4 al Ala Asp Val Ala S 460 Gly Leu Val Thr Ala I 475 Glu Ala Asp Gly Gly 490 Lys Ala Gly Asp Lys 505 Lys Gln Glu Gly Ala 520 Gly Leu Thr Ser Ile 540 Ala Ser 255 Lys Ala 270 Thr Tyr Val Thi Gly Al.

Gly Ly 33 r Glu As 350 e Asp Al 5 n Gly GI r Phe A 5p Gly I 4 eu Asp G 430 sp Gly 1 45 er Thr Leu Asn Thr Leu Val Thr 510 Asn Phe 525 Thr Leu a

S

1 l

S

4 Ile Gly Asp val Lys 320 Ala Ala a Val n Asn a Ser 400 e Thr .5 y Asp ys Asn sp Glu er Leu 480 sp Gly L95 Phe Lys Thr Tyr Gly Thr 530 Giv Asn Asri Gly Ala Lys Thr Giu Ile Asn Lys Asp Gly Leu Thr 545 Thr Pro Ala Asn 550

S.

Thr Ser Ser Thr 625 Asp Ile Val Val Al a 705 Asn Lys Lays G1y Al a 610 Asn As n Ser Arg Ser 690 Lvs Gi-, Gi Asp Leu 595 Asp Leu Thr Ala Asn 675 Gly Gly Lys IAsp Gly 580 Lys Asn Asp Al a Asp 660 Ala Lys Glu GlE 74( 1 Ala Asi y As: u Al a Le 82 g Ph~ 5 Gly Ala Gly 565 Ile Ser Ala Lys Phe Gly Leu Thr Lys 615 Giu Lys Gly 630 Ala Thr Val 645 Lys Thr Thr Asn Glu Val Thr Val Asn 695 Val Val Lys Al a Gly Asp 600 Gin Thr Gly Gly Lys Asn Gly 585 Al a Asn Asp Asp Gly 665 Phe Asn 570 Gin Asn Asp Lys Leu 650 Ser Lys Arg Glu Ala2 Ser Phe Asp Gin 635 Arg Thr Ser Giu Phe ksn Val Asp Al a 620 Thr Gly GiU Gly Ile 700 Thr rhr Lys Pro 605 Tyr Pro Leu Tyr Asn Ile Asn 590 Leu Lys Val Gly His 670 Gly Ser 575 Val Thr Gly Val Trp 655 Asp Ile Ile 560 Val Val Ser Leu Ala 640 Val Gin As n 685 Asn Thr Val Thr 770 Val Thr 785 Leu Giy Thr Lys Lys Val Va] As~ Gi' Le Al Ar 83

I

n a 0 Le Thr 725 Asp 710 Ser Leu Gly Ser Val Thr Arg Asn Leu Thr Lys Gi y Val 730 Gin Gly Pro Asp Lys Lys Leu Tyr Lys 750 Thr Val Tyr 735 Asp 745 Al a Thr Gln Asp 805 Ser Ala Lys Giu Val 790 Giu Al a Asn Tyr Gin 760 Ala Thr 775 Ala Asp Ala Asp Gly Thr Gly..Leu 840 Asp Ile Al a Al a Thr 825 Asn Lys Thr Ile Lys 810 Giu Thr Gly Asn Al a '795 Arg Ile Lys Gly Lys 780 Lys Ala Val Val1 Lys Val 765 Gly Ser Ser Gly Phe Asp Asn Ala 830 Ser Ala 845 Phe Giu Leu Lys Glu Thr '720 715 Val Ser Gly Tyr Phe Giu 800 Asp Lys 815 His Asp Ala Thr 54 Val Glu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 850 855 860 Lys Thr Asp Val Glu Leu Pro Leu Thr Gln lie Tyr Asn Thr Asp Ala 865 870 875 880 Asn Gly Lys Lys Ile Thr Lys Val Val Lys Asp Gly Gln Thr Lys Trp 885 890 895 Tyr Glu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 900 905 910 Leu Gly Asn Val Asp Ser Asp Gly Lys Lys Val Val Lys Asp Asn Asp 915 920 925 Gly Lys Trp Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys 930 935 940 Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val 945 950 955 960 Ser Leu Asp Pro Asn Asp Gln Ser Lys Gly Lys Gly Val Val Ile Asp 965 970 975 Asn Val Ala Asn Gly Asp Ile Ser Ala Thr Ser Thr Asp Ala Ile Asn 980 985 990 Gly Ser Gln Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly 995 1000 1005 Gln Val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly Lys Arg Ala 1010 1015 1020 Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gln Leu Pro Gln Ala 1025 1030 1035 1040 Thr Met Pro Gly Lys Ser Met Val Ala Ile Ala Gly Ser Ser Tyr Gln 1045 1050 1055 Gly Gln Asn Gly Leu Ala Ile Gly Val Ser Arg Ile Ser Asp Asn Gly 1060 1065 1070 Lys Val Ile Ile Arg Leu Ser Gly Thr Thr Asn Ser Gln Gly Lys Thr 1075 1080 1085 Gly Val Ala Ala Gly Val Gly Tyr Gln Trp 1090 1095 INFORMATION FOR SEQ ID NO:3: SEQUENCE

CHARACTERISTICS:

LENGTH: 7291 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY:

CDS

LOCATION: 163. .7221 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: TTTNTTTTTC TTATTTTTTT TTTTTTTTTT TTTTTTTTTT TTGAGGCTAA

ACTTTTNGNA

-AAATATCACT TTTTTATTCT CCAAATATAG AATAGAATAC GCACGATTTC

ACTAAGAAAA

GTATATTTAT CATTAATTTT ATTAAATATA AGGTAAATAA AA ATG AAC AAA ATT Met Asn Lys Ile 1 120 174

S

000.

t TTT AAC GTT ATT TGG AAT GTT ATG ACT CAA ACT api' Asn VlIle Trp Asn Val Met Thr Gin Thr TGG GTT Trp Val 10 GAA CTC ACT CGC ACC CAC ACC AAA CGC Giu Leu Thr Arg Thr His Thr Lys Arg 25 GCC TCC GCA ACC Ala Ser Ala Thr 30- ACG GTT CAG GCG Thr Val Gin Ala GTC GTA TCT Val Val Ser GTG GAG ACC Val Glu Thr AAT GCT ACC AsnAla Thr CCC GTG TTG Pro Val Leu GCC GTA TTG Ala Vai Leu

GCG

Ala 40 ACA CTG TTG Thr Leu Leu TTT GCA Phe Ala 45 CCC GTA Pro Val 60 GAT GAA GAT GAA GAG TTA GAC Asp Glu Asp Glu Glu Leu Asp GTA CGC ACT Val Arg Thr

GCT

Al a AG-- TTC Ser Phe CAT TCC GAT AAA His Ser Asp Lys

GAA

Giu GGC ACG GGA GAA Gly Thr Gly Glu GAA GTT ACA GAA Giu Val Thr Glu 222 270 318 366 414 46 2 510 558 606

AAT

Asn TCA AAT TGG GGA Ser Asn Trp Giy

ATA

Ile 90 TAT TTC GAC AAT Tyr Phe Asp Asn

AAA

Lys 95 GGA GTA CTA AAA Gly Val Leu Lys

GCC

Ala 100 GGA GCA ATC ACC Gly Ala Ile Thr

CTC

Leu 105 AAA GCC GGC GAC Lys Ala Gly Asp

AAC!

Asn 110 CTG AAA ATC AAA CAA AAC Leu Lys Ile Lys Gin Asn ACC CAT Thr Asp GAC CTC Asp Leu GCA AAC Ala Asn 150

GAA

Glu ACC AAT GCC AGT Thr Asn Ala Ser TTC ACC TAC TCG Phe Thr Tyr Ser CTG A.AA AAA Leu Lys Lys 130 ACA CAT CTG ACC Thr Asp Leu Thr 135 GGC GAT AAA GTT Gly Asp Lys Val

ACT

Ser

GTT

Val 140 GCA ACT GAA AAA TTA TCG TTT GGC Ala Tkxr Giu Lys Leu Ser Phe Gly CAT ATT ACC AGT GAT GCA AAT GGC TTG AAA Asp Ile Thr Ser Asp Ala Asfl Gly Leu Lys 155 160 654

TTG

Leu 165 GCG AAA ACA GGT AAC GGA AAT Ala Lys Thr Gly Asn Gly Asl GTT CAT TTG AAT GGT TTG GAT TCA Val His Leu Asn Gly Leu Asp Ser 175 180 702 750 ACT TTG CCT GAT GCG GTA ACG AAT ACA GGT GTG TTA AGT TCA Val Thr Asn Thr Gly Val Leu Ser Ser 190 Ser Ser 195 Thr Leu Pro Asp Al a 185 TTT ACA CCT Phe Thr Pro GTT TTA AAT Val Leu Asn 215 GAT GTT GAA AAA Asp Val Glu Lys

ACA

Thr 205 AGA GCT GCA ACT Arg Ala Ala Thr GTT AAA GA.T Val Lys Asp 210 GCT GGA GGT Ala Gly Gly GCA GGT TGG AAC Ala Gly Trp Asn

ATT

Ile 220 AAA GGT GCT AAA Lys Gly Ala Lys

ACT

Thr 225 798 846 894 942

S

5 AAT GTT Asn Val 230 GAG AGT GTT GAT Glu Ser Val Asp GTG TCC GCT TAT Val Ser Ala Tyr AAT GTT GAA TTT Asn Val Glu Phe

ATT

Ile 245 ACA GGC GAT AAA Thr Gly Asp Lys

AAC

Asn 250 ACG CTT GAT GTT Thr Leu Asp Val

GTA

Val 255 TTA ACA GCT AAA Leu Thr Ala Lys

GAA

Glu 260 AAC GGT AAA ACA Asn Gly Lys Thr

ACC

Thr 265 GAA GTG AAA TTC Glu Val Lys Phe

ACA

Thr 270 CCG AAA ACC Pro Lys Thr AAA GAA AAA Lys Glu Lys AAT AAA GTT Asn Lys Val 295 GGC TTA GTC Gly Leu Val 310

GAC

Asp 280 GGT AAG TTA TTT Gly Lys Leu Phe

ACT

Thr 285 GGA AAA GAG AAT Gly Lys Glu Asn TCT GTT ATC Ser. Val Ile 275 AAC GAC ACA Asn Asp Thr 290 GAG GGT AAT Glu Gly Asn AAG GCT GGT Lys Ala Gly ACA AGT AAC ACG Thr Ser Asn Thr

GCG

Ala 300 ACT GAT AAT ACA Thr Asp Asn Thr 990 1038 1086 1134 1182 1230 *0 S

SO

4*

S

ACT OCA AAA Thr Ala Lys GTG ATT GAT GCT Val Ile Asp Ala GTG AAC Val Asfl 320 TGG AGA GTT AAA Trp Arg Val Lys 325 ACT GTT GCG TCA Thr Val Ala Ser ACA ACT Thr Thr 330 ACT GCT AAT GGT Thr Ala Asn Gly AAT GGC GAC TTC Asn Gly Asp Phe

GCA

Ala 340

GGC

Gly 345 ACA AAT GTA ACC Thr Asn Val Thr

TTT

Phe 350 GAA AGT GGC GAT Glu Ser Gly Asp GGT ACA Gly Thr 355 ACA GCG TCA Thr Ala Ser TAC GAC GCG Tyr Asp Ala 375

GTA

Val 360 ACT AAA GAT Thr Lys Asp ACT AAC Thr -Asn 365 GGC AAT GGC ATC Gly Asn Gly le ACT GTT AAG, Thr Val Lys 370 GAAAA AAA Asv Lys Lys 1278 1326 AAA GTT GGC GAC GGC TTG AAA TTT GAT AGC Lys Val Gly Asp Gly Leu Lys Phe Asp Ser 380 385 ATC GTT Ile Val 390 GCA GAT ACG ACC Ala Asp Thr Thr

GCA

Ala 395 CTT ACT GTG Leu Thr Val ACA GGT Thr Gly 400 GGT AAG GTA GCT Gly Lys Val Ala 1374 GAA ATT GCT AAA GAA GAT GAC AAG AAA AAA Glu Ile Ala Lys Glu Asp Asp Lys Lys Lys CTT GTT AAT GCA GGC GAT 1422 Leu 415 Val Asn Ala Gly Asp 420 405 410 TTG GTA ACA GCT Leu Val Thr Ala

TTA

Leu 425 GGT AAT CTA AGT Gly Asn Leu Ser

TGG

Trp 430 AAA GCA AAA GCT Lys Ala Lys Ala GAG GCT Glu Ala 435 1470 GAT ACT GAT Asp Thr Asp GCA GGC GAA Ala Gly Glu 455

GGT

Gly 440 GCG CTT GAG GGG Ala Leu Giu Gly

ATT

Ile 445 TCA AAA GAC CAA Ser Lys Asp Gin GAA GTC AAA Glu Val Lys 450 AAA GTG AAA Lys Val Lys 1518 ACG GTA ACC TTT AAA GCG GGC AAG AAC Thr Val Thr Phe Lys Ala Gly Lys Asn 460

TTA

Leu 465 1566 CAG GAT Gin Asp 470 GGT GCG AAC TTT Gly Ala Asn Phe

ACT

Thr 475 TAT TCA CTG CAA GAT GCT TTA ACG GGT Tyr Ser Leu Gin Asp Ala Leu Thr Gly 480

TTA

Leu 485 ACG AGC ATT ACT Thr Ser Ile Thr

TTA

Leu 490 GGT GGT ACA ACT Gly Gly Thr Thr

AAT

Asn 495 GGC GGA AAT GAT Gly Gly Asn Asp

GCG

Ala 500 AAA ACC GTC ATC Lys Thr Val Ile

AAC

Asn 505 AAA GAC GGT TTA Lys Asp Gly Leu

ACC

Thr 510 ATC ACG CCA GCA GGT AAT Ile Thr Pro Ala Gly Asn 515 1614 1662 1710 1758 1806 GGC GGT ACG Gly Gly Thr AAA GCA GGT Lys Ala Gly 535

ACA

Thr 520 GGT ACA AAC ACC Gly Thr Asn Thr

ATC

Ile 525 AGC GTA.ACC AAA Ser Val Thr Lys GAT GGC ATT Asp Gly Ile 530 TTA AGA GCT Leu Arg Ala AAT AAA GCT ATT Asn Lys Ala Ile

ACT

Thr 540 AAT GTT GCG AGT Asn Val Ala Ser

GGT

Gly 545 TAT GAC Tyr Asp 550 GAT GCG AAT TTT.

Asp Ala Asn Phe GTT TTA AAT AAC Val Leu Asn Asn

TCT

Ser 560 GCA ACT GAT TTA Ala Thr Asp Leu 1854

AAT

Asn 565 AGA CAC GTT GAA GAT GCT TAT AAA GGT Arg His Val Glu Asp Ala Tyr Lys Gly

TTA

Leu 575 TTA AAT CTA AAT Leu Asn Leu Asn

GAA

Glu 580 AAA AAT GCA AAT AAA CAA CCG TTG GTG Lys Asn Ala Asn Lys Gin Pro Leu Val

ACT

Thr 590 GAC AGC ACG GCG Asp Ser Thr Ala GCG ACT Ala Thr 595 1902 1950 1998 GTA GGC GAT Val Gly Asp TTA CGT AAA TTG GGT TGG GTA GTA TCA ACC AAA AAC GGT Leu Arg Lys Leu Gly Trp Val Val Ser Thr Lys Asn Gly 600 605 610 ACG AAA GAA GAA AGC Thr Lys Glu Glu Ser 615 AAT CAA GTT Asn Gin Val 620 AAA CAA GCT Lys Gin Ala GAT GAA GTC CTC TTT Asp Glu Val Leu Pkxe 625 TCT GAA AAC GGT AAA Ser,Glu Asn Giy Lys 640 ACC GGA GCC GGT GCT GCT Thr Gly Ala Gly Ala Ala

ACG

Thr 635 GTT ACT TCC AAA Val Thr Ser Lys 2046 2094 2142 2190

CAT

His 645 ACG ATT ACC GTT Thr Ile Thr Val

AGT

Ser 650 GTG GCT GAA ACT Val Ala Glu Thr

AAA

Lys 655 GCG GAT TGC GGT Ala Asp Cys Gly

CTT

Leu 660 GAA AAA GAT GGC Glu Lys Asp Gly

GAT

Asp 665 ACT ATT AAG CTC Thr Ile Lys Leu

AAA

Lys 670 GTG GAT AAT CAA Val Asp Asn Gin AAC ACT Asn Thr 675 GAT AAT GTT Asp Asn Val GGC TTT GAA Gly Phe Glu 695

TTA

Leu 680 ACT GTT GGT AAT Thr Val Gly Asn

AAT

Asfl 685 GGT ACT GCT GTC Gly Thr Ala Val ACT AAA GGT Thr Lys Gly 690 CGC GGT AAA Arg Gly Lys 2238 2286 ACT GTT AAA ACT Thr Val Lys Thr GCG ACT GAT GCA Ala Thr Asp Ala

GAT

Asp 705 GTA ACT Val Thr 710 GTA AAA GAT GCT Val Lys Asp Ala GCT AAT GAC GCT Ala Asn Asp Ala

GAT

Asp 720 AAG AAA GTC GCA Lys Lys Val Ala

ACT

Thr 725 GTA AAA GAT GTT Val Lys Asp Val

GCA

Al a 730 ACC GCA ATT AAT Thr Ala Ile Asn

AGT

Ser '735 GCG GCG ACT TTT Ala Ala Thr Phe

GTG

Val 740 AAA ACA GAG AAT Lys Thr Glu Asn

TTA

Leu '745 ACT ACC TCT ATT Thr Thr Ser Ile

GAT

Asp 750 GAA GAT AAT CCT Giu Asp Asfl Pro ACA GAT Thr Asp 755 AAC GGC AAA Asn Gly Lys GCA GGT AAA Ala Gly Lys 775

GAT

Asp 760

AAC

Asn GAC GCA CTT AAA Asp Ala Leu Lys CTG AAA GTT AAA Leu Lys Val Lys 780 GGC GAT ACC TTA Gly Asp Thr Leu ACC TTT AAA Thr Phe Lys 770 ATT ACT

TTT

le Thr Phe 2334 2382 2430 2478 2526 2574 2622 2670 CGT GAT GGA AAA Arg Asp Gly Lys

AAT

Asn 785 GAC TTG Asp Leu 790 GCG AAA AAC CTT Ala Lys Asn Leu

GAG

Glu 795 GTG AAA ACT GCG Val Lys Thr Ala

AAA

Lys 800 GTG AGT GAT ACT Val Ser Asp Thr

TTA

Leu 805 ACG ATT GGC GGG Thr Ile Gly Gly ACA CCT ACA GGT Thr Pro Thr Gly

GGC

Gly 815 ACT ACT GCG ACG Thr Thr Ala Thr

CCA

Pro 820 AAA GTG AAT ATT Lys Val Asn Ile

ACT

Thr 825 AGC ACG GCT GAT Ser Thr Ala Asp GGT TTG AAT TTT GCA AAA GAA Gly Leu Asn Phe Ala Lys Glu 830 835 ACA GCC GAT Thr Ala Asp ACA ACT TTA Thr Thr Leu 855

GCC

Ala 840 TCG GGT TCT AAG Ser Gly Ser Lys AAT GTT TAT TTG AAA GGT ATT GCG Asn Val Tyr Leu Lys Gly Ile Ala 845 850 2718 2766 GCG AAG TCT TCA CAC GTT GAT ACT GAG CCA AGC Thr Glu Pro Ser GCG GGA Ala Gly 860 Ala Lys Ser Ser 865 His Val Asp TTA AAT Leu Asn 870 GTG GAT GCG ACG Val Asp Ala Thr AAA TCC AAT GCA Lys Ser Asn Ala AGT ATT GAA GAT Ser Ile Glu Asp

GTA

Val 885 TTG CGC GCA GGT Leu Arg Ala Gly

TGG

Trp 890 AAT ATT CAA GGT Asn Ile Gin Gly GGT AAT AAT GTT Gly Asn Asn Val

GAT

Asp 900 TAT GTA GCG ACG Tyr Val Ala Thr

TAT

Tyr 905 GAC ACA GTA AAC Asp Thr Val Asn ACC GAT GAC AGC Thr Asp Asp Ser ACA GGT Thr Gly 915 ACA ACA ACG Thr Thr Thr GTT AAA ATC Val Lys Ile 935

GTA

Val 920 ACC GTA ACC CAA Thr Val Thr Gin

AAA

Lys 925 GCA GAT GGC AAA Ala Asp Gly Lys GGT GCT GAC Gly Ala Asp 930 GGT GCG AAA ACT Gly Ala Lys Thr

TCT

Ser 940 GTT ATC AAA GAC CAC AAC GGC AAA Val Ile Lys Asp His Asn Gly Lys 945 CTG TTT Leu Phe 950 ACA GGC AAA GAC Thr Gly Lys Asp

CTG

Leu 955 AAA GAT GCG AAT Lys Asp Ala Asn

AAT

Asn 960 GGT GCA ACC GTT Gly Ala Thr Val 2814 2862 2910 2958 3006 3054 3102 3150 3198 3246 3294

AGT

Ser 965 GAA GAT GAT GGC Glu Asp Asp Gly

AAA

Lys 970 GAC ACC GGC ACA Asp Thr Gly Thr

GGC

Gly 975 TTA GTT ACT GCA Leu Val Thr Ala

AAA

Lys 980 ACT GTG ATT GAT Thr Val Ile Asp

GCA

Al a 985 GTA AAT AAA Val Asn Lys AGC GGT Ser Gly 990 TGG AGG GTA ACC Trp Arg Val Thr GGT GAG Gly Glu 995 GGC GCG ACT Gly Ala Thr GCC GAA Ala Glu 1000 ACC GGT GCA ACC GCC GTG Thr Gly Ala Thr Ala Val 1005 GGC ACG AGC GTG AAC TTC Gly Thr Ser Val Asn Phe 1020 GAA ACC GTT ACA TCA Glu Thr Val Thr Ser 1015 AAT GCG GGT AAC GCT Asn Ala Gly Asn Ala 1010 AAA AAC GGC AAT GCG Lys Asn Gly Asn Ala 1025 ATC AAT GTC AAA TAC Ile Asn Val Lys Tyr 1040 ACC ACA GCG Thr Thr Ala 1030 ACC GTA AGC Thr Val Ser AAA GAT AAT GGC AAC Lys Asp Asn Gly Asn 1035 GAT GTA AAT GTT GGT GAC GGC TTG AAG ATT GGC GAT GAC AAA AAA ATC Asp Val Asn Val Gly Asp Gly Leu Lys Ile Gly Asp Asp Lys Lys Ile 1045 1050 1055 1060 3342 CTT GCA GAC Val Ala Asp CCT GCT GGT Pro Ala Gly ACG ACC ACA Thr Thr Thr 31065 CTT ACT GTA ACA GGT Leu Thr Val Thr Gly 1070 GGT AAG GTG TCT GTT Gly Lys Val Ser Val 1075 GCT AAT Ala Asn 1080 ACT GTT AAT Ser Val Asri AAC AAT Asn Asn 1085 AAG AAA CTT Lys Lys Leu GTT A.AT GCA Val Asn Ala 1090 3390 3438 3486 3534 GAG GGT TTA GCG Glu Gly Leu Ala 1095 GAT AAA TfAT GCA Asp Lys Tyr Ala 1110 ACT GCT TTA Thr Ala Leu AAC AAC Asn Asn 1100 CTA AGC TGG Leu Ser Trp ACG GCA AAA CC Thr Ala Lys Ala 1105 CAT GGC GAG TCA Asp Gly Glu Ser 1115 GAG GGC GAA Giu Gly Glu ACC GAC CAA GAA GTC Thr Asp Gin Giu Val 1120 AAA CCA CCC GAC A.AA GTA ACC TTT AAA GCA CCC AAC AAC TTA AAA GTG Lys Ala Cly Asp Lays Val Thr Phe Lys Ala Cly Lys Asn Leu Lys Val 3582 1125 1130 1135 1140U AAA CAC TCT GA.A AAA CAC TTT ACT TAT TCA CTC CAA CAC ACT Lys Cln Ser Giu Lys Asp Phe Thr Tyr Ser Leu Gln-Asp Thr 1.145 1150 TTA ACA Leu Thr 1155 3630 CCC TTA ACG ACC ATT ACT Gly Leu Thr Ser Ile Thr 1160 ACG CCA ACC GTC ATC AAC Thr Cly Thr Val Ile Asn 1175 CCT GCT CC GCA GGC ACA Cly Ala Ala Ala Gly Thr 1190 TTA GGT GCT Leu Cly Cly 1165 AAA CAC GC Lys Asp Cly 1180 ACA GCT AAT CCC AGA AAT CAT Thr Ala Asn Cly Arg Asn Asp 1170 TTA ACC ATC ACC CTC GCA AAT Leu Thr Ile Thr Leu Ala Asn 3678 3726 3774

S

1185 CAT CC TCT AAC Asp Ala Ser Asn 1195 GCA AAC ACC Gly Asn Thr 1200 ATC ACT CTA Ile Ser Val

ACC

Thr 120~ AAA CAC Lys Asp ACT CCT TTA Ser Ala Leu CAA CAT AAA Gin Asp Lys CCC ATT ACT CC Cly Ile Ser Ala 1210 AAA ACC TAT AAA Lys Thr Tyr Lys 1225 GAG TTC CAC GCC Ciu Phe His Ala 1240 CCT AAT AAA Cly Asn Lys CAA ATT Clu Ile 1215 ACC AAT CTT Thr Asn Val CAT ACT CAA AAC ACT CCA CAT CAA ACA Asp Thr Gin Asn Thr Ala Asp Ciu Thr 1230 1235 CCC CTT AAA AAC GCA AAT GAA CTT GAG Ala Val Lys Asn Ala Asn Giu Val Clu 1245 1250

AAC

Lys 1220 3822 3870 3918 TTC GTC CCT AAA AAC GGT CCA ACC GTC TCT CCA AA.A Phe Val Cly Lys Asn Gly Ala Thi Val Ser Ala Lys ACT CAT AAC AAC Thr Asp Asn Asn 1265 3966 4014 1255 1260 GCA AAA CAT ACT GTA ACC ATT CAT CTT CCA CAA CCC AAA CTT GGT CAT Gly Lys His Thr Val Thr Ile Asp Val Ala Clu Ala Lys Val Cly Asp 12*70 1275 1280 GGT CTT Gly Leu 1285 GAA AAA GAT ACT GAC GGC Glu Lys Asp Thr Asp Gly 1290 AAG ATT AAA CTC AAA GTA GAT A.AT Lys Ile Lys Leu Lys Val Asp Asri 1295 1300 GTT GAT GCA ACA AAA GGT GCA TCC Val Asp Ala Thr Lys Gly Ala Ser 1310 1315 4062 4110 ACA GAT GGG AAT AAT CTA TTA ACC Thr Asp Gly Asn Asn Leu Leu Thr 1305 GTT GCC AAG GGC GAG Val Ala Lys Gly Glu 1320 CAA GGC ACA AAT GCC Gin Gly Thr Asn Ala 1335 TTT AAT GCC Phe Asn Ala GTA ACA ACA Val Thr Thr 1325 GAT GCA ACT ACA GCC Asp Ala Thr Thr Ala 1330 GTT GTC AAG GGT TCA Val Val. Lys Gly Ser 1345 4158 AAT GAG CCC GGT AAA GTG Asn Clu Arg Cly Lys Val.

1340 4206

S

AAT GGT GCA ACT GCT ACC GAA ACT GAC AAG AAA AAA GTG GCA ACT GTT Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys Lys Val Ala Thr Val 1350 1355 1360 4254 GGC GAC GTT Gly Asp Val 1365 GAA AAT GAC Clu Asn Asp GCA AAT GAT Ala Asn Asp GCT AAA GCG ATT AAC GAC Ala Lys Ala Ile Asn Asp 1370 CAC ACT GCT ACG ATT GAT Asp Ser Ala Thr Ile Asp 1385 GCA GCA ACT TTC GTG AAA GTG Ala Ala Thr Phe Val Lys Val 1375 1380 4302 GAT AGC CCA Asp Ser Pro 1390 ACA GAT CAT GGC Thr Asp Asp Gly 1395 TTA AA.A GCG GGT Leu Lys Ala Gly 1410 GCT CTC AAA GCA GGC GAC ACC TTG ACC Ala Leu Lys Ala Gly Asp Thr Leu Thr 1400 1405 4350 4398 4446 4494 AAA AAC TTA AAA GTT AAA CGT GAT GGT AAA AAT ATT ACT TTT CCC CTT Lys Asn Leu Lys Val Lys Arg Asp Gly Lys Asn Ile Thr Phe Ala Leu 1415 1420 1425 GCG AAC GAC CTT AGT GTA AAA AGC GCA ACC CTT AGC CAT AAA TTA TCG Ala Asn Asp Leu Ser Val. Lys Ser Ala Thr Val Ser Asp Lys Leu Ser 1430 1435 1440 CTT GGT ACA Leu Gly Thr 1445 TTG AAC TTC Leu Asn Phe TTA AAT GGC Leu Asn Gly AAC GGC AAT AAA GTC AAT Asn Gly Asn Lys Val Asn 1450 GCT AAA GAT AGT AAG ACA Ala Lys Asp Ser Lys Thr 1465 ATC ACA AGC GAC ACC AAA GCC Ile Thr Ser Asp Thr Lys Gly 1455 1460 GGC GAT GAT GCT AAT ATT CAC Gly Asp Asp Ala Asn Ile His 1470 1475 4542 4590 ATT GCT TCA ACT TTA ACT GAT ACA TTG TTA AAT AGT GGT Ile Ala Ser Thr Leu Thr Asp Thr Leu Leu Asn Ser Gly '1480 1485 1490 4638 4686 GC ACA ACC AAT TTA GGT GCT A.AT GCT ATT ACT CAT AAC GAG AAA AAA Ala Thr Thr Asn Leu Cly Cly Asn Cly Ile Thr Asp Asn Clu Lys Lys 1495 1500 1505 CGC GCG GCG Arg Ala Ala 1510 AGC GTT AAA Ser Val Lys GAT GTC Asp Val 1515 TTG AAT GCG GGT TGG AAT GTT CGT Leu Asn Ala Gly Trp Asn Val Arg 1520 4734 GGT GTT Gly Val 1525 AAA CCG GCA Lys Pro Ala TCT GCA Ser Ala 1530 AAT AAT CAA Asn Asn Gin GTG GAG Val Glu 1535 AAT ATC GAC TTT Asn Ile Asp Phe 1540 GTA GCA ACC TAC GAC ACA Val Ala Thr Tyr Asp Thr 1545 ACG AGT GTA ACT GTT GAA Thr Ser Val Thr Val Giu 1560 GTG GAC TTT Val Asp Phe GTT AGT Val Ser 1550 GGA GAT AAA Gly Asp Lys GAC ACC Asp Thr 1555 AGT AAA GAT AAT GGC APAG Ser Lys Asp Asn Gly Lys 1565 AGA ACC GAA GTT Arg Thr Giu Val 1570 AAC GGC AAA CTG Asn Gly Lys Leu 1585 4782 4830 4878 4926 4974

B.

AAA ATC GGT GCG AAG ACT TCT Lys Ile Gly Ala Lys Thr Ser 1575 GTT ATC Val Ile 1580 AAA GAC CAC Lys Asp His TTT ACA GGC Phe Thr Gly 1590 ACC GAA ACC Thr Glu Thr 1605 AAA GAG CTG Lys Glu Leu AAG GAT GCT Lys Asp Ala 1595 AAC AAT AAT GGC GTA ACT GTT Asn Asn Asn Gly Val Thr Val 1600 GAC GGC AAA GAC GAG GGT Asp Gly Lys Asp Glu Gly 1610 AAT GGT TTA GTG ACT GCA Asn Gly Leu Val Thr Ala 1615

AAA

Lys 1620 5022 GCT GTG ATT GAT GCC GTG AAT AAG, GCT GGT TGG AGA GTT AAA Ala Val Ile Asp Ala Val Asn Lys Ala Gly Trp Arg Vai Lys 1625 1630 ACA ACA Thr Thr 1635 GGT GC-T AAT GGT CAG AAT GAT GAC TTC GCA ACT GTT GCG TCA GGC ACA Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val Ala Ser Gly Thr 1640 1645 1650 AAT GTA ACC TTT GCT GAT GGT AAT GGC ACA ACT GCC GAA GTA ACT AAA Asn Val Thr Phe Ala Asp Gly Asn Gly Thr Thr Ala Glu Val Thr Lys 1655 1660 1665 5070 5118 5166 GCA AAC GAC GGT AGT ATT ACT GTT Ala Asn Asp Gly Ser Ile Thr Val 1670 1675 GGC TTA AAA CTA GAC GGC GAT AAA Gly Leu Lys Leu Asp Gly Asp Lys 1685 1690 TAC AAT GTT AAA GTG GCT GAT Tyr Asn Val Lys Val Ala Asp 1680 5214 5262 ATC GTT GCA GAC ACG ACC Ile Val Ala Asp Thr Thr 1695 GTA CTT Val Leu 1700 GGT AAG Gly Lys 1715 ACT GTG GCA GAT GGT AAA GTT ACA GCT Thr Val Ala Asp Gly Lys Val Thr Ala 1705 CCG AAT AAT GGC GAT Pro Asn Asn Gly Asp 1710 5310 AAA TTT GTT Lys Phe Val GAT GCA AGT GGT TTA Asp Ala Ser Gly Leu 1720 GCG GAT GCG TTA AAT AAA TTA AGC Ala Asp Ala Leu Asn Lys Leu Ser 1725 1730 5358 TGG ACG GCA ACT Trp Thr Ala Thr 1735 GCT GGT AAA GAA GGC Ala Gly Lys Giu Gly 1740 ACT GGT GA.A GTT GAT CCT GCA Thr Gly Glu Val Asp Pro Ala 1745 5406 AAT TCA GCA Asn Ser Ala 1750 GGG CAA GAA GTC AAA Gly Gin Giu Val Lys 1755 GCG GGC GAC AAA GTA Ala Gly Asp Lys Val 1760 GCC GGC Ala Gly 1765 GAC AAC CTG Asp Asri Leu AAA ATC Lys Ile 1770 AAA CAA AGC GGC AAA GAC Lys Gin Ser Gly Lys Asp 1775 ACC TTT AAA Thr Phe Lys TTT ACC TAC Phe Thr Tyr 1780 TTC AAA GAC Phe Lys Asp 1795 5454 5502 TCG CTG AAA AAA Ser Leu Lys Lys GAG CTG Glu Leu 1785 AAA GAC CTG Lys Asp Leu ACC AGC GTA GAG Thr Ser Val Giu 1790 5550 GCA AAC GGC GGT ACA GGC ACT GAA AGC ACC AAG ATT ACC AAA GAC GGC Ala Asn Gly Gly Thr Cly Ser Clu Ser Thr Lys Ile Thr Lys Asp Gly 5598 1800 1805 1810 TTG ACC ATT ACG Leu Thr Ile Thr 1815 CCG GCA AAC Pro Ala Asn GGT CC GGT GCG GCA GCT GCA AAC ACT Gly Ala Cly Ala Ala Gly Ala Asn Thr 1820 1825 5646 5694 GCA A.AC ACC ATT AGC GTA ACC AAA CAT GGC ATT ACC GCG GGT AAT AAA Ala Asn Thr Ile Ser Val Thr Lys Asp Cly Ile Ser Ala Gly Asn Lys 1830 1835 1840 GCA GLT Ala Val 1845 ACA AAC GTT GTC AGC GGA CTG AAG AAA TTT Thr Asn Val Val Ser Cly Leu Lys Lys Phe 1850 1855 GCT GAT GCT CAT Gly Asp Cly His 1860 CAT TAT GAC AAT His Tyr Asp Asn 1875 5742 ACG TTG GCA AAT Thr Leu Ala Asn GGC ACT GTT GCT CAT TTT GAA AAG Gly Thr Val Ala Asp Phe Giu Lys 1865 1870 GCC TAT AAA Ala Tyr Lys

GAC

Asp 18B( TTG ACC AAT TTG CAT GAA Leu Thr Asn Leu Asp Glu 1885 GAC AAT ACC CCT CCA ACC Asp Asn Thr Ala Ala Thr 1900 CCG ACT GTT GCC Pro Thr Val Ala 1895 AAA CGC GCG GAT AAT AAT Lys Gly Ala Asp Asri Asn 1890 GTG GGC CAT TTG CGC GGC Val Gly Asp Leu Arg Gly 1905 ACA GGC GAA CCC AAT CAG Thr Gly Glu Pro Asn Gin 1920 5790 5838 5886 TTG CCC TGG Leu Gly Trp 1910 GTC ATT TCT Val Ile Ser GCG GAC AAA ACC Ala Asp Lys Thr 1915 5934

GAA

Glu 1925 TAC AAC GCG Tyr Asn Ala CAA GTG CGT AAC GCC Gin Val Arg Asn-Ala 1930 AAT GTT TCC GGT AAA Asn Val Ser Cly Lys 1945 AAT CAA GTG AAA TTC Asn Ciu Val Lys Phe 1935 ACA TTG AAC GGT ACC Thr Leu Asn Gly Thr 1950 AAG AGC Lys Ser 1940 CCC CTC Arg Val 1955 5982 6030 GGC AAC GGT ATC Gly Asn Gly Ile ATT ACC TTT GAA TTG Ile Thr Phe Giu Leu 1960 ACC GTT AAG AAT GCC Thr Val Lys Asn Ala 1975 GCT AAA GGC GAA GTG GTT AAA TCG AAT GAA TTT Ala Lys Gly Glu Val Val Lys Ser Asn Giu Phe 1965 1970 6078 GAT GGT TCG GAA Asp Gly Ser Glu 1980 ACG AAC TTG GTT AAA GTT GGC Thr Asfl Leu Val Lys Val Gly 1985 6126 GAT ATG TAT Asp Met Tyr 1990 TAC AGC AAA Tyr Ser Lys GAG GAT Glu Asp 1995 ATT GAC CCG GCA ACC Ile Asp Pro Ala Thr 2000 AGT AAA CCG Ser Lys Pro 6174 ATG A CA Met Thr 2005 GGT AAA ACT GAA AAA TAT Gly Lys Thr Glu Lys Tyr 2010 AAG GTT GAA AAC Lys Val Glu Asn 2015 GGC AAA GTC Gly Lys Val

GTT

Val 2020 6222 9 TCT GCT AAC GGC Ser Ala Asn Gly AGC AAG ACC Ser Lys Thr 2025 GAA GTT ACC CTA Glu Val Thr Leu 2030 ACC AAC Thr Asn AAA GGT TCC Lys Gly Ser 2035 AAA TCA GGC Lys Ser Gly 2050 GGC TAT GTA Gly Tyr Val ACA GGT AAC Thr Gly Asn 2040 CAA GTG GCT GAT Gin Val Ala Asp 2045 GCG ATT GCG Ala Ile Ala 6270 6318 6366 6414 TTT GAG CTT GGT Phe Giu Leu Gly 2055 TTG GCT GAT Leu Ala Asp GCG GCA GAA Ala Ala Glu 2060 GCT GAA AAA GCC TTT GCA Ala Glu Lys Ala Phe Ala 2065 GAT AAA GCG GAA ACT GTA Asp Lys Ala Glu Thr Val 2080 GAA AGC GCA Glu Ser Ala 2070 AAA GAC AAG, Lys Asp Lys CAA TTG TCT AAA Gin Leu Ser Lys 20'75 AAT GCC CAC GAT AAA Asn Ala His Asp Lys 2085 GTC CGT Val Arg 2090 TTT GCT AAT Phe Ala Asfl GGT TTA Gly Leu 2095 AAT ACC AA.A Asn Thr Lys Val 2100 AGC GCG GCA ACG Ser Ala Ala Thr GTG GAA AGC Val Glu Ser 2105 ACT GAT GCA AAC GGC Thr Asp Ala Asn Gly 2110 ACA ACC TTT Thr Thr Phe GTG AAA Val Lys 2120 ACC GAT GTG Thr Asp Val

GAA

Giu 2125 TTG CCT TTA Leu Pro Leu GAT AAA GTG ACC Asp Lys Val. Thr 2115 ACG CAA ATC TAC Thr Gin Ile Tyr 2130 GCT GAC GGA AAA Ala Asp Gly Lys 2145 6462 6510 6558 AAT ACC GAT GCA AAC Asn Thr Asp Ala Asn 2135 TGG TAT GAA CTG, AAT Trp Tyr Glu Leu Asn 2150 CTT GGT A.AC GTG GAT Leu Gly Asn Val Asp 2165 GGT AAT AAG ATC GTT AAA AAA Gly Asn Lys Ile Val Lys Lys 2140 GCT GAT GGT ACG GCG Ala Asp Gly-Thr Ala 2155 GCA AAC GGT AAG AA.A Ala Asn Gly Lys Lys 2170 AGT AAC AAA GAA GTG ACA Ser Asn Lys Glu Val Thr 2160 6606 6654 6702 GTT GTG AAA GTA ACC Val Val Lys Val Thr 2175 Glu 2180 I AAT GGT GCG C Asn Gly Ala AAA ACC AAA Lys Thr Lys CAC GTT GTC His Val Val 2215 ;AT AAG TGG TAT ~sp Lys Trp Tyr 2185 ;GC GAA GTG AGC ;ly Glu Val Ser Z200 CGC CTT GAT CCG krg Leu Asp Pro TAC ACC AAT GCT Tyr Thr Asn Ala 2190 AAT GAT AAA. GTT Asn Asp Lys Val.

2205 GAC GGT GCT GCG GAT Asp Gly Ala Ala Asp 2195 TCT ACC GAT GAA AAA Ser Thr Asp Giu Lys 2210 AAC GGC AAA GGC GTG Asn Gly Lys Gly Val 2225 6750 6798 6846 AAC AAT Asn Asn 2220 CAA TCG Gin Ser GTC ATT GAC AAT GTG GCT Val Ile Asp Asn Val Ala 2230 AAT GGC GAA Asn Gly Giu 2235 ATT TCT GCC ACT TCC ACC GAT Ile Ser Ala Thr Ser Thr Asp 2240 6894 GCG ATT AAC GGA AGT CAG TTG TAT GCC GTG Ala Ile Asri Gly Ser Gin Leu Tyr Ala Val GCA AAA Ala Lys 2255 GGG GTA ACA Gly Val Thr

AAC

Asn 2260 2245 2250 CTT GCT GGA CAA Leu Ala Gly Gln GTG AAT AAT Val Asn Asn 2265 CTT GAG GGC AAA GTG Leu Glu Gly Lys Val 2270 AAT AAA GTG GGC Asn Lys Val Gly 2275 GCT TCA CAG TTA Ala Ser Gin Leu 2290 6942 6990 703B AAA CGT GCA Lys Arg Ala GAT GCA Asp Ala 2280 GGT ACA GCA Gly Thr Ala AGT GCA TTA Ser Ala Leu 2285 CCA CAA GCC ACT ATG Pro Gin Ala Thr Met 2295 CCA GGT AAA TCA ATG Pro Gly Lys Ser Met 2300 GTT GCT ATT GCG GGA AGT Val Ala Ile Ala Gly Ser 2305 AGT TAT CAA GGT CAA AAT GGT TTA GCT ATC GGG GTA TCA AGA ATT TCC Ser Tyr Gin Giy Gin Asn Gly Leu Ala Ile Gly Val Ser Arg Ile Ser 2310 2315 2320 GAT AAT GGC AAA GTG ATT ATT CGC TTG TCA GGC ACA ACC Asp Asn Gly Lys Val Ile Ile Arg Leu Ser Gly Thr Thr 2325 2330 2335 GGT AAA ACA GGC GTT GCA GCA GGT GTT GGT TAC CAG TGG Gly Lys Thr Gly Val Ala Ala Gly Val. Gly Tyr Gin Trp 2345 2350 AAT AGT CAA Asn Ser Gln 2340

TAAAGTTTGG

7086 7134 7182 7231 7291 ATTATCTCTC TTAAAA.AGCG GCATTTGCCG CTTTTTTTAT GGGTGGCTAT

TATGTATCGT

INFORMATION FOR SEQ ID NO:4: (i4 SEQUENCE

CHAR.ACTERISTICS:

LENGTH: 2353 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein I I I 0 0 S 00.0*0 (xi) SEQUENCE DE Met Asn Lys Ile Phe As 1 5 Val Val Val Ser Glu Le Thr Val Glu Thr Ala Va Ala Asn Ala Thr Asp Gl Ala Pro Val Leu Ser P1 Glu Val Thr Glu Asn S 85 Val Leu Lys Ala Gly A 100 Ile Lys Gin Asn Thr A 115 Ser Leu Lys Lys Asp L 130 Leu Ser Phe Gly Ala P 145 1 Asn Gly Leu Lys Leu 165 Glv Leu Asp Ser Thr 180 Ser Ser Ser Ser Phe 195 Thr Val Lys Asp Val 210 Thr Ala Gly Gly Asn 225 Asn Val Glu Phe Ile 245 Thr Ala Lys Giu Asn 260 Thr Ser Val Ile Lys 275 66 SCRIPTION: SEQ ID NO n Val Ile Trp Asn Va 10 u Thr Arg Thr His Th 25 1 Leu Ala Thr Leu Le 40 u Asp Glu Glu Leu As 55 he His Ser Asp Lys Gl er Asn Trp Gly Ile T) 90 la Ile Thr Leu Lys A 105 sp Glu Ser Thr Asn A 120 eu Thr Asp Leu Thr S 135 sn Gly Asp Lys Val A 50 1 Ala Lys Thr Gly Asn G 170 Leu Pro Asp Ala Val 185 Thr Pro Asn Asp Val 200 Leu Asn Ala Gly Trp 215 Val Glu Ser Val Asp 230 Thr Gly Asp Lys Asn 250 Gly Lys Thr Thr Glu 265 Glu Lys Asp Gly Lys 280 :4: 1 Met Th r Lys Ar u Phe Al 4 p Pro Va u Gly Th '5 yr Phe As la Gly As la Ser S 1 er Val A 140 sp Ile T ly Asn V rhr Asn T Glu Lys Asn Ile 220 Leu Val 235 Thr Leu Val Lys Leu Phe G1 g Al 3 a Th 1 Va r Gl p As p As 11 er P la T hr S al H hr C Thr 205 Lys Ser Asp Phe Thr 285 a 0 r 1 y n sn 1 he h e 19 Ar

G]

A

V

T

2 c Thr Tr Ser Al Val G1 Arg Th Glu Ly 8 Lys Gl Leu L) Thr T Glu L r Asp A s Leu A 175 y Val I 0 g Ala y Ala la Tyr al Val 255 hr Pro y 70 ;ly Lys a n r

'S

~0 Ps y yr ys la sn Leu Ala Lys Asn 240 Leu Lys Glu 67 Asn Asn Asn Asp Thr Asn Ly 290 s Val Thr Ser Thr Ala Thr Asp Asn Thr 300 295 Asp C 305 Asn Gly Gly Ile Ser 385 Gly Asn Lys Gin Leu 465 Al a Gly Pro Lys 1u -,ys ksp ksp rhr 370 Asp Lys Ala Ala Glu 450 Lys Leu Asr, Al a Asi Gly A Ala G Phe Gly 'I 355 Val I Lys Val Gly Glu 435 ValI Val1 Thr Asp Gly 515 Gly ~sn C ay I ~la 140 'hr .ys ys kla Asp 420 Al a Lys Lys Gly Ala 500 Asn Ile ;ly rrp L'hr rhr ryr Ile Giu 405 Leu Asp Al a Leu 310 Arg Val Ala Asp Val 390 Ile Val Thr Gly Val Val Al a Ser Ala 375 Ala Al a Thr Asp Glu 455 Thr Lys Ser Val 360 Lys Asp Lys Al a Gly 440 Thr Ala Thr Gly 345 Thr Val Thr Glu Leu 425 *Ala *Val Lys Thr 330 Thr Lys Gly Thr Asp 410 Gly Leu Thr Ala 315 Thr Asn Asp Asp Ala 395 Asp Asn *Glu *Phe Val Ala Val Thr Gly 380 Leu Lys Leu Gly Lys Ile Asn Thr Asn 365 Leu Thr Lys Ser Ile 445 Al a Asp Gly Phe 350 Gly Lys Val1 Lys Trp 430.

Ser Gly kl a Gin 335 Giu Asn Phe Thr Leu 415 Lys Lys Lys Val 320 Asn S er Gly Asp Gly 400 Val1 Ala Asp Asn 460 Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp 470 475 Leu 485 Lys Gly Lys Thr Thr Gly Al a Ser Val Thr Gly Ile Ile Thr 520 Asn Thr Asn 505 Gly Lys Leu 490 Lys Thr Al a Gly Asp Asn Ile Gly Gly Thr Thr Thr Thr Leu Thr 510 Ile Ser 525 Asn Val Asn 495 Ile Val Al a Gly Thr Thr Ser 530 535 540 Gly 545 Al a Leu Thr Arg Asp Ala Leu Glu 580 Tyr Asfl 565 Lys Asp Asp 550 Arg His Asfl Ala Al a Val1 Asn Glu Phe Asp 570 Gin Asp 555 Al a Pro Val1 Tyr Leu Leu Lys Val Asn Gly Thr 590 Asn Leu 575 Asp Ser 560 Leu Ser Asn Leu Asn Asfl Lys 5 85 Thr Ala Ala Thr Val Gly Asp Leu Arg Leu Gly Trp Val Val Ser 605 595 600 a Thr Glu 625 Giu Asp Asn Val Asp 705 Lys Al a Asn Leu Asn 785 Val Thr Phe Lys Ser 865 lal. I AsnC rhr 690 Arg Lys Thr Pro Thr '770 Ile Ser Al a Ala Gly 850 His snf ~eu ;iy ;ly ksn 675 Ly Val1 Phe Thr 15 5 Phe Thr Asp Thr Lys 835 Ilf Va Gly TI Phe TI Lys F, 6 Leu C 660 Thr Glv C Lys Ala Val 740 Asp Lys Phe pThr Pro 820 Glu Ala 1. Asp hr ~hr [is ;45 ;iu ~sp ;iy rhr 725 Lys Asn Ala Asp Let Ly~ Th: Th Le Lys Gly 630 Thr Lys Asn Phe Thr 710 Val Thr Gly Gly Leu 790 1Thr ;Val r Al~ r Th: u As: Giu Glu S 615 Ala Gly A Ile Thr V Asp Gly A~ 6 Val Leu TI 680 Glu Thr 1 695 Val Lys I Lys Asp Giu Asn Lys Asp 760 *Lys Asn 775 *Ala Lys Slie Gly -Asn Ile i Asp Ala 840 Leu Thr 855 Val Asp er l1a al ~sp .65 'hr al1 ~sp Jal Leu 745 Lei.

Asn Al a Ser 650 Thr Val Lys Al a Al a 730 Thr Ala Lys Gin Thr 635 Val1 Ile Gly Thr Thr 715 Thr Thr Leu Val Val Lys G 620 Val Thr S Ala Glu I1 Lys Leu I Asn Asn C 685 Gly Ala 700 Ala Asn Ala Ile Ser Ile *Lys Ala 765 *Lys Arg 780 Val Lys Pro Thr r Ala Asp r Lys Asfl in erI hr ~ys ;iy rhr ksp Asn Asp 750 Gly Asp Thr Gly Gly 830 Vali la .ys -,ys la 1 rhr ksp Al a Ser 735 Glu Asp Giy Ala Gi') 81! Lei Ty: Asp S er 640 Ala Asp Al a Al a Asp '720 Al a Asp Thr Lys Lys 800 Thr .i Asn r Leu Asn Gly Thr 825 Ser Giu Al a Leu Asn 810 Ser Gly Pro Thr GitL 795 Th~ Th~ Se 845 Al a 860 Lys Gly Ser Al a Asn Lys Ala Ser Ala 880 870 875 Ser Ile Giu Asp Val1 885 Leu Arg Ala Gly Trp, 890 Asn Ile Gin Gly Asn Gly 895 69 Asn Asn Val Asp Tyr Val Ala Thr Tyr Asp Thr Val Asn Phe Thr Asp 900 905 910 Asp Ser Thr Gly Thr Thr Thr Val Thr Val Thr Gln Lys Ala Asp Gly 915 920 925 Lys Gly Ala Asp Val Lys Ile Gly Ala Lys Thr Ser Val Ile Lys Asp 930 935 940 His Asn Gly Lys Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn 945 950 955 960 Gly Ala Thr Val Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu 965 970 975 Val Thr Ala Lys Thr Val Ile Asp Ala Val Asn Lys Ser Gly Trp Arg 980 985 990 Val Thr Gly Glu Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val Asn 995 1000 1005 Ala Gly Asn Ala Glu Thr Val Thr Ser Gly Thr Ser Val Asn Phe Lys 1 010 1015 1020 Asn Gly Asn Ala Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn Ile 1025 1030 1035 1040 Asn Val Lys Tyr Asp Val Asn Val Gly Asp Gly Leu Lys Ile Gly Asp 1045 1050 1055 Asp Lys Lys Ile Val Ala Asp Thr Thr Thr Leu Thr Val Thr Gly Gly 1060 1065 1070 Lys Val Ser Val Pro Ala Gly Ala Asn Ser Val Asn Asn Asn Lys Lys 1075 1080 1085 SLeu Val Asn Ala Glu Gly Leu Ala Thr Ala Leu Asn Asn Leu Ser Trp 1090 1095 1100 Thr Ala Lys Ala Asp Lys Tyr Ala Asp Gly Glu Ser Glu Gly Glu Thr 1105 1110 1115 1120 Asp Gln Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys Ala Gly Lys 1125 1130 1135 Asn Leu Lys Val Lys Gin Ser Glu Lys Asp Phe Thr Tyr Ser Leu Gin 1140 1145 1150 Asp Thr Leu Thr Gly Leu Thr Ser Ile Thr Leu Gly Gly Thr Ala Asn 1155 1160 1165 Gly Arg Asn Asp Thr Gly Thr Val Ile Asn Lys Asp Gly Leu Thr Ile 1170 1175 1180 Thr Leu Ala Asn Gly Ala Ala Ala Gly Thr Asp Ala Ser Asn Gly Asn 1185 1190 1195 T Thr Ile Ser Val Thr Lys Asp Gly Ile Ser Ala Gly Asn Lys Glu ile 1205 1210 1215 Thr Asn Val Lys Ser Ala Leu Lys Thr Tyr Lys Asp Thr Gln Asn Thr 1220 1225 1230 Ala Asp Glu Thr Gln Asp Lys Glu Phe His Ala Ala Val Lys Asn Ala 1235 1240 1245 Asn Glu Val Glu Phe Val Gly Lys Asn Gly Ala Thr Val Ser Ala Lys 1250 1255 1260 Thr Asp Asn Asn Gly Lys His Thr Val Thr Ile Asp Val Ala Glu Ala 1265 1270 1275 1280 Lys Val Gly Asp Gly Leu Glu Lys Asp Thr Asp Gly Lys Ile Lys Leu 1285 1290 1295 Lys Val Asp Asn Thr Asp Gly Asn Asn Leu Leu Thr Val Asp Ala Thr 1300 1305 1310 Lys Gly Ala Ser Val Ala Lys Gly Glu Phe Asn Ala Val Thr Thr Asp 1315 1320 1325 Ala Thr Thr Ala Gln Gly Thr Asn Ala Asn Glu Arg Gly Lys Val Val 1330 1335 1340 Val Lys Gly Ser Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys Lys 1345 1350 1355 1360 e Val Ala Thr Val Gly Asp Val Ala Lys Ala Ile Asn Asp Ala Ala Thr 1365 1370 1375 Phe Val Lys Val Glu Asn Asp Asp Ser Ala Thr Ile Asp Asp Ser Pro 1380 1385 1390 Thr Asp Asp Gly Ala Asn Asp Ala Leu Lys Ala Gly Asp Thr Leu Thr 1395 1400 1405 Leu Lys Ala Gly Lys Asn Leu Lys Val Lys Arg Asp Gly Lys Asn Ile 1410 1415 1420 Thr Phe Ala Leu Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val Ser 1425 1430 1435 1440 Asp Lys Leu Ser Leu Gly Thr Asn Gly Asn Lys Val Asn Ile Thr Ser 1445 1450 1455 Asp Thr Lys Gly Leu Asn Phe Ala Lys Asp Ser Lys Thr Gly Asp Asp.

1460 1465 Ala Asn Ile His Leu Asn Gly Ile Ala Ser Thr Leu Thr Asp Thr Leu 1475 1480 1485 Leu Asn Ser Gly Ala Thr Thr Asn Leu Gly Gly Asn Gly Ile Thr Asp 1490 1495 1500 71 Lys Asp Val 1515 Asn Glu Lys Lys Arg Ala Ala Ser Val 1505 1510 Leu Asn Ala Gly 1520 1 V Il -11u 4 0* Trp Asn Val Arg Gly Val Lys Pro Ala ser Ala Asn 1525 1530 Asn Ile Asp Phe Val Ala Thr Tyr Asp Thr Val Asp 1 1540 1545 Asp Lys Asp Thr Thr Ser Val Thr Val Glu Ser Lys 1555 1560 Arg Thr Glu Val Lys Ile Gly Ala Lys Thr Ser Val 1570 1575 1580 Asn Gly Lys Leu Phe Thr Gly Lys Glu Leu Lys Asp 1585 1590 1595 Gly Val Thr Val Thr Glu Thr Asp Gly Lys Asp Glu 1605 1610 Val Thr Ala Lys Ala Val Ile Asp Ala Val Asn Lys 1620 1625 Val Lys Thr Thr Gly Ala Asn Gly Gln Asn Asp Asp 1635 1640 Ala Ser Gly Thr Asn Val Thr Phe Ala Asp Gly Asn 1650 1655 166C Glu Val Thr Lys Ala Asn Asp Gly Ser Ile Thr Val 1665 1670 1675 Lys Val Ala Asp Gly Leu Lys Leu Asp Gly Asp Lys 1685 1690 Thr Thr Val Leu Thr Val Ala Asp Gly Lys Val Thr 1700 1705 1535 ?he Val Ser Gly 1550 Asp Asn Gly Lys 1565 Ile Lys Asp His Ala Asn Asn Asn 1600 Gly Asn Gly Leu 1615 Ala Gly Trp Arg 1630 Phe Ala Thr Val 1645 Gly Lys Ile Ala Thr Tyr Thr Asn Ala Val 1680 Asp Asn Val Ala 169 Pro Asn 1710 Gly Asp Gly Lys Lys Phe Val Asp Ala Ser Gly Leu Ala Asp Ala Leu 1715 1720 1725 Asn Lys Leu Ser Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu 1730 1735 1740 Val Asp Pro Ala Asn Ser Ala Gly Gln Glu Val Lys Ala Gly Asp Lys 1745 1750 1755 1760 Val Thr Phe Lys Ala Gly Asp Asn Leu Lys Ile Lys Gln Ser Gly Lys 1765 1770 1775 Asp Phe Thr Tyr Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val 1780 1785 1790 Glu Phe Lys Asp Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys Ile 1795 1800 1805 Thr Lys Asp Gly Leu Thr Ile Thr Pro Asn Gly Ala Gly Ala Ala 1820 1810 1815 Gly Ala Asn Thr Ala Asn Thr Ile Ser Val Thr Lys Asp Gly Ile Ser 1825 1830 1835 1840 Ala Gly Asn Lys Ala Val Thr Asn Val Val Ser Gly Leu Lys Lys Phe 1845 1850 1855 Gly Asp Gly His Thr Leu Ala Asn Gly Thr Val Ala Asp Phe Glu Lys 1860 1865 1870 His Tryr Asp Asn Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly 1875 1880 1885 Ala Asp Asn Asn Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly 1890 1895 1900 Asp Leu Arg Gly Leu Gly Trp Val Ilie Ser Ala Asp Lys Thr Thr Gly 1905 1910 1915 1920 Glu Pro Asn Gin Glu Tyr Asn Ala Gin Val Arg Asn Ala Asn Glu Val 1925 1930 1935 Lys Phe Lys Ser Gly Asn Gly Ile Asn Val Ser Gly Lys Thr Leu Asn 1940 1945 1950 Gly Thr Arg Val Ilie Thr Phe Glu Leu Ala Lys Gly Glu Val Val Lys 1955 1960 1965 Ser Asn Giu Phe Thr Val Lys Asn Ala Asp Gly Ser Glu Thr Asn Leu 1970 197 5 1980 Val Lys Val Gly Asp Met Tyr 1985 1990 Thr Ser Lys Pro Met Thr Gly 2005 Gly Lys Val Val Ser Ala Asn.

2102 0 Asn Lys Gly Ser Gly Tyr Val 2035 Ala Lys Ser Gly Phe Glu Leu 2050 2055 Lys Ala Phe Ala Glu Ser Ala 2065 2070 Ala Glu Thr Val Asn Ala His 2085 Asn Thr Lys Val Ser Ala Ala 2100 ryr Ser Lys Glu Asp Ile Asp Pro Ala 1995 2000 Lys Thr Glu Lys Tyr Lys Val Glu Asn 2010 2015 Gly Ser Lys Thr Glu Val Thr Leu Thr 2025 2030 Thr Gly Asn Gin Val Ala Asp Ala Ile 2040 2045 Gly Leu Ala Asp Ala Ala Glu Ala Giu 2060 Lys Asp Lys Gin Leu Ser Lys Asp Lys 20'75 2080 Asp Lys Val Arg Phe Ala Asn Gly Leu 2090 2095 Thr Val Glu Ser Thr Asp Ala Asn Gly 2105 2110 73 Asp Lys Val Thr Thr Thr Phe Val Lys Thr Asp Vai Glu Leu Pro Leu 2115 2120 2125 Thr Gin Ile Tyr Asn Thr Asp Ala Asn Gly Asn Lys Ile Val Lys Lys 2130 2135 2140 Ala Asp Gly Lys Trp, Tyr Giu Leu Asn Ala Asp Gly Thr Ala Ser Asn 2145 2150 2155 2160 Lys Glu Val Thr Leu Gly Asn Val Asp Ala Asn Gly Lys Lys Val Val 2165 2170 2175 Lys Val Thr Giu Asn Gly Ala Asp Lys Trp Tyr Tyr Thr Asn Ala Asp 2180 2185 2190 Gly Ala Ala Asp Lys Thr Lys Gly Giu Val Ser Asn Asp Lys Val Ser 2195 2200 2205 Thr Asp Glu Lys His Val Val Arg Leu Asp Pro Asn Asn Gin Ser Asn 2210 2215 2220 Gly Lys Gly Val Val Ile Asp Asn Val Ala Asn Gly Glu Ile Ser Ala 2225 2230 2235 2240 Thr Ser Thr Asp Ala Ile Asn Gly Ser Gin Leu Tyr Ala Val Ala Lys 2245 2250 2255 Gly Val Thr Asn Leu Ala Gly Gin Val Asn Asn Leu Giu Gly Lys Val 2260 2265 2270 Asn Lys Val Gly Lys Arg Ala Asp Ala Gly Thr Ala Ser Ala Leu Ala 0000.

2275 2280 Ala Ser Gin. Leu Pro Gin Ala Thr Met 2290 2295 Ile Ala Gly Ser Ser Tyr Gin Gly Gin 2305 2310 Ser Arg Ilie Ser Asp Asn Gly Lys Val 2325 Thr Asn Ser Gin Gly Lys Thr Gly Val 2340 234 Trp INFORMATION FOR SEQ ID SEQUENCE

CHARACTERISTICS:

LENGTH: 658 amino aci TYPE: amino acid STRANDEDNESS: unknowr TOPOLOGY: unknown Pro Gly Lys Ser Met Val Ala 2300 Asn Gly Leu Ala Ile Gly Val 2315 2320 Ile Ile Arg Leu Ser Gly Thr 2330 2335 Ala Ala Gly Val Gly Tyr Gin 2350 (ii) (xi) Met 1 Val Thr Ala Ala Val Ser Le Gl Ly 14 Il

SE

T

L

I

2

I

MOLECULE TYPE: protein SEQUENCE DESCRIPTION: SEQ ID Asn Lys Ile Phe Asn Val Ile Trp Asn Val Val 5 Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Asn Asn Asn Thr Pro Val Thr Asn Lys Leu Ly 55 Asn Phe Asn Phe Thr Asn Asn Ser Ile Ala As 70 Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu As Asp Lys Leu Leu Val Glu Asp Asn Thr Ala Al 100 105 i Arg Lys Leu Gly Trp Val Leu Ser Ser Lys As 115 120 u Lys Ser Gin Gin Val Lys His Ala Asp Glu V 130 135 s Gly Gly Val Gin Val Thr Ser Thr Ser Glu A 5 150 155 e Thr Phe Ala Leu Ala Lys Asp Leu Gly Val L 165 170 r Asp Thr Leu Thr Ile Gly Gly Gly Ala Ala 180 185 hr Pro Lys Val Asn Val Thr Ser Thr Thr Asp 195 200 ys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr 210 215 le Gly Ser Thr Leu Thr Asp Thr Leu Val Gly 25 230 235 le Asp Gly Gly Asp Gin Ser Thr His Tyr Thr 245 250 ,ys Asp Val Leu Asn Ala Gly Trp Asn Ile Lys 260 265 Thr Gun Cys Ali 30 Ala Th 45 Ala Ty p Ala Gl n Glu Ly a Thr Va n Gly T 125 al Leu P sn Gly L ys Thr la Gly Ily Leu 205 Val His 220 Ser Pro Arg Ala Gly Val r r u s h y

L

Thr Trp 1.5 Ser Ala Val Giu Gly Asp Lys Gin Asn Ala 95 Gly Asn r Arg Asn e Glu Gly s His Thr 160 a Thr Val 1-75 la Thr Thr ys Phe Ala eu Asn Gly la Thr His 240 Ala Ser Ile 255 Lys Ala Gly 2-70

S

Ser Th Thr Va 29 Asp Se 305 Thr Se Asn L Asp G Asn L 3 Gly A 385 Gly A Val L Lys I Ala Lys 465 Ser Asn Ala Ser Gly 545 Thr r 1 0 r er 's lu ys 70 sp sn ys le Asr Ly Tr Al Gl Le 53 Ai

P

Thr Gly 275 Glu Phe Lys Glu Val Ile Glu Thr 340 Gly Lys 355 Thr Gly Phe Ala Gly Thr Tyr Asp 42 Ala Al 435 Asn Pr 0 s Leu Va p Thr Th a Ser G1 y Lys A 515 u Gin A 0 n Asn G ro Ala A Gin Ser Gl Leu Ser Al 29 Asn Gly Ly 310 Lys Glu Ly 325 Asn LyS V Gly Leu V Trp Arg I 3 Thr Val A 390 Thr Ala T 405 Ala Lys V Asp Thr I o Lys Gly 1 Thr Ala 470 r Thr Ala 485 u Gin Glu 3n Leu Lys sp Ala Leu ly Ala Lys 550 sn Gly Ala 565 u As 28 a As 5 s Ar s As l As al Th 36 le L) 75 la S hr V al G hr A 4 Lys 455 Lys Ala Val Val Thr 535 Thr Gly 0 p g p p r 0 e a 4

G

L

5 c Val AsI Thr Gl Thr G1 Gly Ly 33 Gly Al 345 Ala Ly Thr Th r Gly T 1 Thr A 4: y Asp G 425 a Leu T 0 l Ala A Ly Leu N lu Ala ys Ala 505 ys Gin 20 ly Leu Glu Ile Ala Asn Phe u Th u Va 31' s Le 0 a As s As r As ir As 35 sn G 10 ly L hr V sp V al T 4 Asp 490 Gly Glu Thr Asn Asn 570 5

LI

l u n p p n 9 ly ea a a h Asc

G

S

L

5 Val His Thr 285 Thr Thr Val 300 Lys Ile G1 Phe Thr G1 Ala Thr Gi 35 Val Ile As 365 Ala Asn Gl 380 Val Thr P1 Thr Asp G Lys Leu A 4 1 Asn Asp G 445 1 Ala Ser T 460 r Ala Leu A 5 y Gly Thr p Lys Val ly Ala Asfl 525 er Ile Thr 540 ys Asp Gly 55 la Asn Thr

Y

u 0 p y P1

L

he Ly 3C 1 h s Le T1

P

L

I

Tyr Asp Thr Val Ala Lys 320 Lys Ala 335 Asp Ala Ala Val Gin Asn Ala Ser 400 Ile Thr 415 Gly Asp y Lys Asn r Asp Glu n Ser Leu 480 u Asp Gly 495 r Phe Lys he Thr Tyr eu Gly Thr eu Thr Ile 560 :le Ser Val 575 76 Thr Lys Asp Gly Ile Ser Ala Gly Gly Gln Ser 580 585 Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe 595 600 Ser Ala Asp Asn Leu Thr Lys Gln Asn Asp Asp 610 615 Thr Asn Leu Asp Glu Lys Gly Thr Asp Lys Glr 625 630 635 Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arc 645 650 Ile Ser INFORMATION FOR SEQ ID NO:6: SEQUENCE

CHARACTERISTICS:

LENGTH: 607 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Met Asn Lys Ile Phe Asn Val Ile Trp Asn V 1 5 Val Val Val Ser Glu Leu Thr Arg Thr His T 25 Arg Gly Asp Pro Val Leu Ala Thr Leu Leu P Asn Ala Thr Asp Glu Asp Glu Glu Leu Asp I Pro Val Leu Ser Phe His Ser Asp Lys Glu 70 Val Thr Glu Asn Ser Asn Trp Gly Ile Tyr Leu Lys Ala Gly Ala Ile Thr Leu Lys Ala 100 105 Lys Gln Xaa Thr Asp Glu Xaa Thr Asn Ala 115 120 Leu Lys Lys Asp Leu Thr Asp Leu Thr Ser 130 135 Val Asp Ala 620 n Thr g Gly Lys Asn Val 590 Pro Leu Thr 605 Tyr Lys Gly Pro Val Val Leu Gly Trp 655 Val Ser Leu Ala 640 Val al hr he Pro Gly Phe G1 Se Va Met Thr G: Lys Arg L 3 Ala Thr V Val Val A .Thr Gly Asp Asn y Asp Asn r Ser Phe 125 .1 Ala Thr 140 1n eu 0 'al irg 3lu Lys Le i4 Th Gl Thr T Arg A Gin A Thr 3 Lys Gly u Lys r Tyr u Lys rp ,sn la .la Glu Val Xaa Ser Leu Ser Phe 145 Gly Le Leu As Ser Se Val Ly 21 Ala Gl 225 Val Gl Ala L) Ser V Asn A 2 Glu G 305 Lys A Asp P Asp Thr Asp 385 Lys Ala Ala e p r

S

0 y u

PS

aLl 9 h 31

V

3

L

V

G

c Gly Ala Lys Le.

Ser Thr 184 Ser Ph 195 Asp Va Gly As Phe Il Glu As Ile L' 275 Thr A 0 v Asn G a Gly T e Ala I y Thr 355 al Lys ys Lys al Ala ly Asp lu Ala 435 Asn Gly 150 Ala Lys 165 Leu Prc Thr Pr L Leu As a Val Gl 23 e Thr Gl 245 n Xaa Ly 0 s Glu L ;n Lys V ly Leu V 3 rp Arg V 325 hr Val A hr Ala ryr Asp Ile Val Glu Ile 405 Leu Val 420 Asp Thr n u 0 y

S

a 1 a

A

3

A

I

77 Asp Lys Val Thr Gly Asn Asp Ala Val 165 Asn Asp Va 200 Ala Gly Tr 215 Ser Val As Asp Lys As Thr Thr Gi 26 Asp Gly L) 280 1 Thr Ser A 295 1 Thr Ala L 0 1 Lys Thr T a Ser Gly T 3 r Val Thr 360 la Lys Val 375 la Asp Thr 90 la Lys Glu hr Ala Leu Asp Gly Ala 440

L

p p n u y h t 4

G

T

P

Asp Ile 155 Gly Asn 170 Thr Asn G)u Lys Asn Il Leu Va 23 Thr Le 250 Val Ly Leu P1 n Thr A s Ala V 3 r Thr A 330 r Asn V ys Asp I ly Asp hr Ala sp Asp 410 Gly Asn 125 Leu Glu e 1 5 u

S

e a 1 1 a

L

3

L

Thr Ser Asp Ala Asn 160 Val His Leu Asn Gly 175 Thr Gly Val Leu Ser 190 Thr Arg Ala Ala Thr 205 Lys Gly Ala Lys Thr 220 Ser Ala Tyr Asn Asn 240 Asp Val Val Leu Thr 255 Phe Thr Pro Lys Thr 2710 Thr Gly Lys Glu Asn 28.5 a Thr Asp Asn Thr Asp 300 1 Ile Asp Ala Val Asn 5 320 a Asn Gly Gin Asn Gly 335 l Thr Phe Glu Ser Gly 350 r Asn Gly Asn Gly Ile 365 ly Leu Lys Phe Asp Ser 360 eu Thr Val Thr Gly Gl1 95 400 vs Lys Lys Leu Val Asn 415 Leu Ser Trp Lys Ala Lys 430 G1y Ile Ser Lys Asp Gin 445 78 Glu Val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn Leu 450 455 460 Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gln Asp Ala 465 470 475 480 Leu Thr Gly Leu Thr Ser Ile Thr Leu Gly Gly Thr Thr Asn Gly Gly 485 490 495 Asn Asp Ala Lys Thr Val Ile Asn Lys Asp Gly Leu Thr Ile Thr Pro 500 505 510 Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr Ile Ser Val Thr Lys 515 520 525 Asp Gly Ile Lys Ala Gly Asn Lys Ala Ile Thr Asn Val Ala Ser Gly 530 535 540 Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala 545 550 555 560 Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu Asn 565 570 575 Leu Asn Glu Lys Asn Ala Asn Lys Gln Pro Leu Val Thr Asp Ser Thr 580 585 590 Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 595 600 605 INFORMATION FOR SEQ ID NO:7: SEQUENCE

CHARACTERISTICS:

LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: Met Asn Lys Ile Phe Asn Val Ile Trp Asn Val Met Thr Gin Thr Trp 1 5 10 Val Val Val Ser Glu Leu Thr Arg INFORMATION FOR SEQ ID NO:8: SEQUENCE

CHARACTERISTICS:

LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met Asn Lys Ile Phe Asn Val Ile Trp Ash Val Val Thr Gln Thr Trp 1 5 10 Val Val Val Ser Glu Leu Thr Arg INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein SEQUENCE DESCRIPTION: SEQ ID NO:9: Met Asn Lys Ile Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 1 5 10 Val Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Lys Ile Tyr Arg Leu Lys Phe Sert Lys Arg Leu Asn Ala Leu 1 5 10 Val Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: Met Asn Lys Ala Tyr Ser Ile Ile Trp Ser His Ser Arg Gln Ala Trp 1 5 10 Ile Val Ala Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:12: SEQUENCE

CHARACTERISTICS:

LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Met Asn Arg Ile Tyr Ser Leu Arg Tyr Ser Ala Val Ala Arg Gly Phe 1 5 10 Ile Ala Val Ser Glu Phe Ala Arg INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: Met Asn Lys Ile Tyr Tyr Leu Lys Tyr Cys His Ile Thr Lys Ser Leu 1 5 10 Ile Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:14: SEQUENCE

CHARACTERISTICS:

LENGTH: 2037 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION:. SEQ ID NO:14: ATGAACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT

TGTCGTATCT

GAACTCACTC

ACCCTGTTGT

GAGTTAGAAC

ACTGGAGAAC

GCAGTAGGAA

GGCAATGACT

GAAAAATTAT

TTGAAATTGG

ATTGCTTCGA

ATTGATGCGG

AATATCCAAG

GTCAATGGCG

GTCCGTGTGG

GCACCCACAC

CCGCAACGGT

CCGTACAACG

AAGAGGGAAC

GCAGCACAAT

TCACCTACTC

CGTTTGGCGC

CGAAALACAGG

CTTTGACCGA

TTAATTATCA

GCATATGGAAA

CGAATGCCAA

ATGTAACAGG

CAAATGCGCC

TCAGGCGAAT,

CTCTGTTTTA,

AACAGAGGTA

CACCTTCAAA

GCTGAAAAAA

AAACGGCAAT

TAACGGAAAT

TACGCTTGCC

TCGCGCTGCA

CAATGTCGAT

TGTGAGCGTT

CTTGCCGGTT

3CTACCGATG kGGTGGAGCT kTAAATTTGA 3CCGGCGACA 3AGCTGAAAA

TAAGTTGATA

GGTCAA.AACA

GGTGGCACAA

AGCGTACAAG

TTTGTCCGTA

ACGGCTGATA

CAATATGTTA

AAAGATGACC

AAAGTGAAAI

GGCACGGAAC

GTTACGTTGJ

AAAACGAAGA I TCAAATCCGC I ACACAGATTC

J

ACCTGAAAAT

ACCTGACCAG

TTACCAGTGA

GTAATGTTCA

CAGGACACGT

ATGTGTTAAA

CTTACGACAC

CGGCTCACAA

CGGAAGACGG

GTTCGGCGGA

TGGTATCGGC

ACACC-GATGC

k GCACGAGCAA TCCGCCACCG TGGCAGTTGC .TATTGGCA

GATGAAGAA

7AAGGAAGGC

,TCAGGAAA.T

AAACAAAGC

GTTGAAACT

GCAP.ATGGC

:TTAAACGGT

rGACACCAAC 2AGCGGTTGG

ZGTGGACTTT

A.AAGACAACT

CAAAACCGTT

rATGAATCAA

A.AGCGGTACA

GGTLCAGCTTT

TGCTTATGCC

TGGTTTGAAT

CGA'rACGGTT

TTCAATTTCA

AAGCC-TGAAC

TGGTACATCC

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1S60 GTGAA.AGTGG GCA ATGAGTA TTACAAAGCC AAAGTCGAkA

AATC-CGGTGA

AAGCAATTAA

AATGGCGGTA

TTTAAATTTA

ACTTTTACGC

AAAGGTGCAA

ACGGCGAGCT

AAATTAGCAA

AAGCCTTGCA

GGCGAACC

TGTTGCAGAC

AGACAAACAG

CAGATAACGA CGGCGGCAAG GCAACTCAAA

CTTTAAGCAA

AATCTAGCGA

CGAAAAAAGG

ATACAACTGA

TGGCGAGTTG

TTCGGTACAG

AGGTTTGGTT

TTGAAAATTA

GTTGGCGATG

GAGGCTTCTG

GTCGGCAGCG

GCGCGACCGG

ATGGCAAGGC

AATTGGTTGA

GCGAGCTTGA

AAACTGGGTT GGAAAGTAGG

GGTTGAGAAA

AAGGAAACTT

GTCAAACAAG

AGCGTGGAGT

TAGTGAAGTC

AGGGCACAAA

TTAAAGACAC

GGGCGATAAA

GTAACTTTGA

CTTCACTTAC

GCGCTCAAAG

GGCGAATGGT,.GCAAACGGTG

A.AGCCGGCGA

ATGAATTGAC

CAAGCACGA.A

cAXICTGAAG

GGGCGTGAAG

GATTACCAAA

82 e rnhr Irr-, GACGGCTTGA CCATTACGCT GGCAAACGGT GCGAATGjGATG AAGATTAAAG TTGCTTCGGA CGGCATTAGC GCGGGTAATA AAGCAGTTAA

AAA

GCAGGCGAAA TTTCTGCCAC TTCCACCGAT GCGATTAACG GAAGCCAGTT

GTA

GCAAAAGGGG TAACAAACCT TGCTGGACAA GTGAATAATC TTGAGGGCAA

AGI

GTGGGCAAAC GTGCAGATGC AGGTACTGCA AGTGCATTAG CGGCTTCACA

GT'I

GCCACTATGC CAGGTAA.ATC AATGGTTTCT ATTGCGGGAA GTAGTTATCA

AGC

GGTTTAGCTA TCGGGGTATC AAGAATTTCC GATAATGGCA AAGTGATTAT

TCC

GGCACAACCA ATAGTCAAGG TAAAACAGGC GTTGCAGCAG GTGTTGGTTA

CCJ

INFORMATION FOR SEQ ID SEQUENCE

CHARACTERISTICS:

LENGTH: 679 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Lys Ile Phe Asn Val Ile Trp Asn Val Val Thr 1 5 Val Val Val Ser Giu Leu Thr Arg Thr His Thr Lys Cys

TGCCGAC

CGTCGCG

.TGCC!GTG

GAATAAA

~ACCACAA

ITCAAAAT

.CTTGTCT

%GTGG

1620 1680 1740 1800 1860 1920 1980 2037 Gin Thr Trp Ala Ser Ala Thr Val Ala Va1 Ala Vai Leu Ala Thr Leu Leu Ser Ala Thr Val Gln 40 Ala Asn Ala Thr Asp Glu Asn Glu Asp Asp Glu Glu Giu Leu Giu Pro 55 Val Gin Arg Ser Val Leu Arg Trp Ser Phe Lys Ser Ala Lys Glu Gly 70 75 Thr Gly Giu Gin Giu Gly Thr Thr Giu Val Ile Asn Leu Asn Thr Asp 90 Ser Ser Gly Asn Ala Val Gly Ser Ser Thr Ile Thr Phe Lys Ala Gly 100 105 110 Asp Asn Leu Lys Ile Lys Gln Ser Gly Asn Asp Phe Thr Tyr Ser Leu 115 120 125 Lys Lys Glu Leu Lys Asn Leu Thr Ser Val Giu Thr Giu Lys Leu Ser 130 135 140 83 Val Asp Phe Gly Ala Asn Gly Asn Lys 145 150 Ile Thr Ser Asp Ala Asn Gly Leu Lys Leu Ala Lys 1 165 His Thr Al a Asn 225 Val1 Lys Val1 Lys Gly 305 Asnr Ala~ Leu Thr Ala 210 Gly Asn Lys Thr Al a 290 Glu Pro Val Asn Gly Ile 180 Gly His Val 195 Ser Val Gin Asn Asn Val Gly Ala Asn 245 Thr Thr Val 260 Giu Asp Gly 275 Lys Asp Asp Leu Ala Lys Val. Lys Ile 325 Ser Phe Lys rhr Gly kla Ser ksp Thr Asp Val 215 Asp Phe 230 Ala Asn Arg Val Lys Thr Gly Ser 295 Thr Lys 310 Ser Asn ksn Thr Asn 200 Leu Val Val Asp Val 280 Ala Val Va] Gly Leu 185 Ile Asn Arg Ser Val1 265 Val Asp Lys L Ala Asn C 170 Thr Asp Ser Thr Val 250 Thr Lys Met Leu Asp ;iy sp Giy ryr 235 I'hr Gly Val1 Asnf Val1 315 Gly G1n Thr Val.

Trp.

220 Asp Ala Leu Gly Gin 300 Ser Thr ksn Leu Asn 205 Asn Thr Asp Pro Asn Ser Al a 190 Tyr Ile Val Thr Val.

270 Giu Asn 175 Gly His Gin Asp Ala 255 Gin Tyr V.al Gly Arg Gly Phe 240 His Tyr Tyr 285 Lys Al a Glu Val Ser Asp Glu Gly Thr Asn Thr 320 Asp 330 Gin Val. Thr 350 Gin Leu Lys Ala Leu Gin Asp Lys 340 345 Leu Ser Gly Lys 3"70 Ser Ser 385 Thr Phe Ala Ser Ser Glu Thr 355 Ala Asp Thr Ile Leu 435 Ser Thr Giy Pro Ser 420 Val Asn Gin Giu Lys 405 Lys Glu Ala Thr Leu 390 Lys Gly Ser Tyr Leu 375 Leu Gly Ala Leu Ala Asn 360 Ser Asn Lys Ile Ser Val Asn Thr 425 Asn Lys 440 Gly Gly Ser Gin 410 Thr Leu Gly Leu Ala 395 Val1 Glu Gly Thr Asfl 380 Thr Gly Gly Trp Asp 365 Phe Gly Asp Leu Lys 445 ksn Lys Asp Asp Val1 430 Val Asp Gly Phe Lys Thr Val 400 Giy Lys 415 Glu Ala Gly Val a a a a. a.

a Giu Lys 450 Val Lys 465 Val Lys Thr Gly Gly Ala Asn Gly 530 Ala Ser 545 Ala Gly Leu Tyr Asn Leu Thr Ala 610 Gly Lys 625 Gly Leu Ile Arg Ala Gly Ser Gin Val Ser 515 Ala Asp Glu Ala Giu 595 Ser Ser Ala Leu Val1 Gly Giu Lys 500 Thr Asn Gly Ile Val1 580 Gly Ala Met Ile Ser 660 Gly Asp Gly 485 Ser Lys Gly Ile Ser 565 Ala Lys Leu Val Gly 645 Gly Tyr Lys 470 Thr Val Ile Ala Ser 550 Al a Lys Val1 Ala Ser 630 Val1 Thr Gin 455 Val Asn Giu Thr Thr 535 Ala Thr Gly Asn Ala 615 Ile Ser Thr Trp Thr Phe Phe Lys 520 Val Gly Ser Vai Lys 600 Ser Ala Arg As n Val Gly Ser Gly Glu Leu 84 Asp Leu Thr Lys 505 Asp Thr Asn Thr Thr 585 Val1 Gin Gly Ile Ser 665 Lys Tyr 490 Asp Giy Asp Lys Asp 570 Asn Gly Leu Ser Ser 650 Gin Ala 475 Ala Thr Leu Ala Ala 555 Ala Leu Lys Pro Ser 635 Asp Gly Gly Leu Ala Thr Asp 540 Val Ile Ala Arg Gin 620 Tyr Asn Lys Asp Lys Asn Ile 525 Lys Lys Asn Gly Al a 605 Al a Gin Gly Thr Asn Asp Gly 510 Thr Ile Asn *Gly Gin 590 Asp Thr Gly Lys Gly 670 Leu Glu 495 Ala Leu Lys Val1 Ser 575 Val1 Ala Met Gin Val 655 Val1 Lys 480 Leu Asn Al a Val1 Al a 560 Gin Asn Gly Pro Asn 640 Ile Al a Gly Thr Ser Lys Glu Thr Leu 460 675 INFORMATION FOR SEQ ID NO:16: Ci) SEQUENCE CHARLACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CCGTGCTTGC CCAACACGCT T 21 INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: GCTGCCACCT TGCACAACAA C 21 99 INFORMATION FOR SEQ ID NO:18: *9@oo* S" SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: e* CTTTCAATGC CAGAAAGTAG G 21 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: CTTCAACCGT TGCGGACAAC A 21

Claims

1. A method of treating a patient in need thereof comprising: administering to said patient an immunogenic composition comprising a pharmaceutically acceptable carrier and a recombinant Haemophilus adhesion protein having greater than 50% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID NO:

2. The method according to claim 1, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown in Figure 2 (SEQ ID NO:2) Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID

3. The method according to claim 2 wherein the recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4). S

4. The method according to claim 2 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 15 (SEQ ID NO: The method according to claim 2 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 2 (SEQ ID NO:2).

6. The method according to claim 1, wherein said immunogenic composition is administered prophylactically such that subsequent Haemophilus infection is prevented.

7. The method according to claim 1, wherein said immunogenic composition is administered therapeutically to a patient previously exposed or infected by Haemophilus.

8. The method according to claim 1, wherein said immunogenic composition is administered as a single dose.

87- 9. The method according to claim 1, wherein said immunogenic composition is administered in several doses over a period of time. A method of preventing Haemophilus infection comprising: administering to said patient a therapeutically effective amount of a composition comprising a pharmaceutically acceptable carrier and a recombinant Haemophilus adhesion protein having greater than 50% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID 11. The method according to claim 10, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown in Figure 2 (SEQ ID NO:2) Figure 3 (SEQ ID NO:4) or Figure (SEQ ID NO: 12. The method according to claim 11 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4). 13. The method according to claim 11 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 15 (SEQ ID NO: 14. The method according to claim 11 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 2 (SEQ ID NO:2). The method according to claim 10, wherein said immunogenic composition is administered as a single dose. 16. The method according to claim 10, wherein said immunogenic composition is administered in several doses over a period of time. 17. Use of a recombinant Haemophilus adhesion protein having greater than homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID NO:15) in the manufacture of an immunogenic -88 composition for prophylactic or therapeutic use in generating an immune response. 18. Use according to claim 17, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID 19. Use according to claim 18 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4). 20. Use according to claim 18 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 15 (SEQ ID 21. Use according to claim 8 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 2 (SEQ ID NO:2). 22. Use according to claim 17, wherein said immunogenic composition is adapted to be administered prophylactically such that subsequent Haemophilus infection is prevented. u 23. Use according to claim 17, wherein said immunogenic composition is adapted to be administered therapeutically to a patient previously exposed or infected by Haemophilus. 24. Use according to claim 17, wherein said immunogenic composition is adapted to be administered as a single dose. Use according to claim 17, wherein said immunogenic composition is adapted to be administered in several doses over a period of time. Dated this 12 th day of July 2000. Washington University AND St.Louis University By their Patent Attorneys, Davies Collison Cave