AU735717B2

AU735717B2 - Flea protease proteins, nucleic acid molecules, and uses thereof

Info

Publication number: AU735717B2
Application number: AU28015/97A
Authority: AU
Inventors: Glenn R Frank; Patrick J. Gaines; Robert B Grieve; Shirley Wu Hunter; Keith E Rushlow; Gary Silver; Gary L. Steigler
Original assignee: Heska Corp
Current assignee: Heska Corp
Priority date: 1996-04-24
Filing date: 1997-04-24
Publication date: 2001-07-12
Anticipated expiration: 2017-04-24
Also published as: AU2801597A; JP2001510324A; IL120704A0; WO1997040058A1; CA2252581A1; EP0900231A1

Description

WO 97/40058 PCTUS97/0612-1 FLEA PROTEASE PROTEINS, NUCLEIC ACID MOLECULES, AND USES THEREOF Field of the Invention The present invention relates to novel flea protease proteins and their use to reduce flea infestation of animals. The present invention also relates to the use of anti-flea protease antibodies and other compounds that reduce flea protease activity to reduce flea infestation of animals.

Background of the Invention Fleas, which belong to the insect order Siphonaptera, are obligate ectoparasites for a wide variety of animals, including birds and mammals. Flea infestation of animals is of health and economic concern because fleas are known to cause and/or transmit a variety of diseases. Fleas cause and/or carry infectious agents that cause, for example, flea allergy dermatitis, anemia, murine typhus, plague and tapeworm. In addition, fleas are a problem for animals maintained as pets because the infestation becomes a source of annoyance for the pet owner who may find his or her home generally contaminated with fleas which feed on the pets. As such, fleas are a problem not only when they are on an animal but also when they are in the general environment of the animal.

The medical and veterinary importance of flea infestation has prompted the development of reagents WO 97/40058 PCT/US97/06121 2 capable of controlling flea infestation. Commonly encountered methods to control flea infestation are generally focussed on use of insecticides in formulations such as sprays, shampoos, dusts, dips, or foams, or in pet collars. While some of these products are efficacious, most, at best, offer protection of a very limited duration.

Furthermore, many of the methods are often not successful in reducing flea populations on the pet for one or more of the following reasons: failure of owner compliance (frequent administration is required); behavioral or physiological intolerance of the pet to the pesticide product or means of administration; and the emergence of flea populations resistant to the prescribed dose of pesticide. Additional anti-flea products include nontoxic reagents such as insect growth regulators (IGRs), including methoprene, which mimics flea hormones and affect flea larval development.

An alternative method for controlling flea infestation isthe use of flea vaccines to be administered to animals prior to or during flea infestation. However, despite considerable interest in developing anti-flea reagents, no flea vaccine presently exists.

Summary of the Invention The present invention relates to flea serine protease proteins, to flea aminopeptidase proteins, and to flea WO 97/40058 PCTIUS97/0612-1.

3 cysteine protease proteins; to flea serine protease, aminopeptidase and/or cysteine protease nucleic acid molecules, including those that encode such proteins; to antibodies raised against such proteins; and to compounds that inhibit flea serine protease, aminopeptidase and/or cysteine protease activities. The present invention also includes methods to obtain such proteins, nucleic acid molecules, antibodies, and inhibitors. Also included in the present invention are therapeutic compositions comprising such proteins, nucleic acid molecules, antibodies, and/or inhibitors as well as the use of such therapeutic compositions to protect a host animal from flea infestation.

One embodiment of the present invention is an isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene including a serine protease gene comprising a nucleic acid sequence including a nucleic acid molecule including SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:34, SEQ ID SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:120, SEQ ID NO:130, SEQ ID NO:154, SEQ ID NO:116, SEQ ID NO:117, SEQ ID NO:127, SEQ ID NO:121, SEQ ID NO:131, SEQ ID NO:155, SEQ ID NO:114, SEQ ID WO-97/40058 WO 9740058PCTUS97/0612-1 4 NO:125, SEQ ID NO:l18, SEQ ID NO:128, SEQ ID NO:152, SEQ ID NO:156, SEQ ID NO:l60, SEQ ID NO:1L36, SEQ ID NO:78, SEQ ID NO:158, SEQ ID NO: 132, SEQ ID NO: 134, SEQ ID NO: 66, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:B0, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:138, SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84 and/or SEQ ID NO:45; an aminopeptidase gene comprising a nucleic acid molecule selected from the group consisting of SEQ ID NO:ll0 and/or SEQ ID NO:112; and a cysteine protease gene comprising a nucleic acid molecule selected from the group consisting of SEQ ID NO:l1, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:76 and/or SEQ ID NO:94.

The present invention also includes a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO: 19, SEQ ID NO: 22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO: 69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161,.SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 NO: 81, SEQ ID NO: 83, SEQ ID NO: 143, SEQ ID NO: 139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO: 163, SEQ ID NO: 162, SEQ ID NO: 69, SEQ ID NO: 85, SEQ ID NO:107, SEQ ID NO:lll, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and/or SEQ ID NO:95, or with a nucleic acid sequence that is a complement of any of the nucleic acid sequences. A preferred nucleic acid sequence of the present invention includes a nucleic acid molecule comprising a nucleic acid sequence including SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:43 and SEQ ID SEQ ID NO:120, SEQ ID NO:130, SEQ ID NO:154, SEQ ID NO:116, SEQ ID NO:117, SEQ ID NO:127, SEQ ID NO:121, SEQ ID NO:131, SEQ ID NO:155, SEQ ID NO:1L14, SEQ ID NO:125, SEQ ID NO:118, SEQ ID NO:128, SEQ ID NO:152, SEQ ID NO:156, SEQ ID NO:l60, SEQ ID NO:136, SEQ ID NO:78, SEQ ID NO:158, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:66, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:138, SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84, SEQ ID NO:1l0, SEQ ID NO:112, SEQ ID NO: 76, SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, WO 97/40058 PCT/US97/06121 6 SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and/or SEQ ID NO:94, and allelic variants thereof.

The present invention also includes an isolated protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID SEQ ID NO:13, SEQ ID NO:16, NO:22, SEQ ID NO:24, SEQ ID NO:27, NO:33, SEQ ID NO:36, SEQ ID NO:38, NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:71, SEQ ID NO:72, NO:96, SEQ ID NO:115, SEQ ID NO:126, NO:129, SEQ ID NO:153, SEQ ID NO:157, NO:137, SEQ ID NO:79, SEQ ID NO:159, NO:135, SEQ ID NO:67, SEQ ID NO:147, NO:151, SEQ ID NO:81, SEQ ID NO:83, NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:19, SEQ ID NO:30, SEQ ID NO:41, SEQ ID NO:69, SEQ ID NO:73, SEQ ID NO:119, SEQ ID NO:161, SEQ ID NO:133, SEQ ID NO:149, SEQ ID NO:143, SEQ ID NO:123, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID The present invention also relates to recombinant molecules, recombinant viruses and recombinant cells that include a nucleic acid molecule of the present invention.

Also included are methods to produce such nucleic acid WO 97/40058 PCT/US97/06121 7 molecules, recombinant molecules, recombinant viruses and recombinant cells.

Yet another embodiment of the present invention is a therapeutic composition that is capable of reducing hematophagous ectoparasite infestation. Such a therapeutic composition includes an excipient and a protective compound including: an isolated protein or mimetope thereof encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95; an isolated nucleic acid molecule that WO 97/40058 WO 7/4058PCT/US97/0612-1 hybridizes under st ringent hybridization conditions gene comprising a nlu NO:9, SEQ ID NO:11, SEQ ID NO:17, NO: 21, NO: 2 8, NO: 34, NO: 4 0, NO: 120, NO: 117, NO: 155, NO: 12 8, NO: 13 6, NO: 134, 1s NO: 150, NO: 138, NO: 8 4,

SEQ

NO: 2 3, NO: 2 9, NO: 3 5, NO: 4 2, NO: 130, NO: 127, NO: 114, NO: 152, NO: 7 8, cleic acid sequence including SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NrO:371 SEQ ID NO:39, SEQ ID NO:431 SEQ ID NO:45, SEQ ID NO:154, SEQ ID NO:116, SEQ ID NO: 121, SEQ ID NO:131, SEQ ID NO:125, SEQ ID NO:118, SEQ ID NO:156, SEQ ID NO:160, SEQ ID NO:158, SEQ ID NO:132, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:140, SEQ ID NO:122, with a SEQ ID SEQ ID SEQ ID SEQ ID SEQ I1D SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:66, SEQ ID NO:80, SEQ ID NO:144, SEQ ID NO:110 SEQ ID NO:112, SEQ ID NO:76, NO: 1, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and SEQ ID NO: 94; an isolated antibody that selectively binds to a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID WO 97/40058 WO 9740058PCTUS97IO612-1 9 SEQ ID NO:3 3, SEQ ID NO:36,, SEQ ID NO:38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:l35, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO: 68, SEQ ID NO: 163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 arnd SEQ ID NO: 95; an inhibitor of protease activity identified by its ability to inhibit the activity of a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID NO:1O, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO: 126, SEQ ID NO: 119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO: 157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 NO: 159, SEQ ID NO:l133, SEQ ID NO: 135, SEQ ID NO: 67, SEQ ID NO: 147, SEQ ID NO: 149, SEQ ID NO: 151, SEQ ID NO: 81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:l23, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID and a mixture thereof. Also included in the present invention is a method to reduce flea infestation, comprising the step of administering to the animal a therapeutic composition of the present invention.

Another embodiment of the present invention is a method to identify a compound capable of inhibiting flea protease activity, the method comprising: contacting an isolated flea protease protein comprising an amino acid sequence including SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO: 126, SEQ ID 140:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID 140:157, SEQ ID 140:161, SEQ ID N0:137, SEQ ID 140:79, SEQ ID NO0:159, SEQ ID NO0:133, SEQ ID NO: 135, SEQ ID NO: 67, SEQ ID 140:147, SEQ ID NO: 149, SEQ ID NO0:151, SEQ ID NO: 81, SEQ ID NO0:83, SEQ ID NO: 143, SEQ ID NO: 139, SEQ ID NO: 145, SEQ ID WO 97/40058 PCT/US97/06121 11 NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:1ll, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95 with a putative inhibitory compound under conditions in which, in the absence of said compound, the protein has proteolytic activity; and determining if the putative inhibitory compound inhibits the activity. The present invention also includes a kit to to identify a compound capable of inhibiting flea protease activity.

The present invention also includes an isolated flea protease protein that cleaves an immunoglobulin, when the protein is incubated in the presence of the immunoglobulin in about 100 microliters of about 0.2M Tris-HCl for about 18 hours at about 370C. A preferred protease protein capable of cleaving immunoglbulin comprises an amino acid sequence selected from the group consisting of SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:96.

Another embodiment of the present invention includes a method to identify a compound capable of inhibiting flea immunoglobulin proteinase protein activity, the method comprising: contacting an isolated flea immunoglobulin proteinase protein with a putative inhibitory compound under conditions in which, in the absence of the compound, the protein has immunoglobulin proteinase activity; and (b) WO 97/40058 PCT/US97/06121 12 determining if the putative inhibitory compound inhibits the activity.

Brief Description of the Drawings Fig. 1 is a scanned image depicting SDS-PAGE of DFPlabeled larval proteases in unfed larvae, fed 1st instar larvae and fed 3rd instar larvae.

Detailed Description of the Invention The present invention includes the use of compounds that inhibit flea protease activity to protect a host animal from flea infestation. The inventors have discovered that proteases are significant components of the flea midgut and are good targets for immunotherapeutic and/or chemotherapeutic intervention to reduce flea burden both on the host animal and in the immediate surrounding) environment of the animal. The inventors have shown, for example, that the viability and/or fecundity of fleas consuming a blood meal is reduced when the blood meal contains compounds that reduce flea protease activity, probably because the compounds interfere with flea digestion and other functions. Compounds that reduce the amount and/or activity of flea proteases without substantially harming the host animal are included in the present invention. Such compounds include flea protease vaccines, anti-flea protease antibodies, flea protease WO 97/40058 PCTIUS97/06121 13 inhibitors, and/or compounds that suppress protease synthesis; such compounds are discussed in more detail below.

One embodiment of the present invention is a method to protect a host animal from flea infestation by treating the animal with a composition that includes a compound that reduces the protease activity of fleas feeding (includes fleas in the process of feeding as well as fleas having fed) from the treated animal thereby reducing the flea burden on the animal and in the environment of the animal.

It is to be noted that the term or "an" entity refers to one or more of that entity; for example, a compound refers to one or more compounds. As such, the terms "a" (or "one or more" and "at least one" can be used interchangeably herein. Thus, a composition of the present invention can include one or more compounds that target (reduced the activity of) one or more proteases in the flea.

As used herein, the phrase "to protect an animal from flea infestation" refers to reducing the potential for flea population expansion on and around the animal reducing the flea burden). Preferably, the flea population size is decreased, optimally to an extent that the animal is no longer bothered by fleas. A host animal, as used herein, is an animal from which fleas can feed by attaching to and feeding through the skin of the animal. Fleas, and WO 97/40058 PCT/US97/06121 14 other ectoparasites, can live on a host animal for an extended period of time or can attach temporarily to an animal in order to feed. At any given time, a certain percentage of a flea population can be on a host animal whereas the remainder can be in the environment surrounding the animal in the environment of the animal). Such an environment can include not only adult fleas, but also flea eggs and/or flea larvae. The environment can be of any size such that fleas in the environment are able to jump onto and off of a host animal. As such, it is desirable not only to reduce the flea burden on an animal per se, but also to reduce the flea burden in the environment surrounding the animal.

In accordance with the present invention, a host animal is treated by administering to the animal a compound of the present invention in such a manner that the compound itself a protease inhibitor, protease synthesis suppressor or anti-flea protease antibody) or a product generated by the animal in response to administration of the compound antibodies produced in response to a flea protease vaccine, or conversion of an inactive inhibitor "prodrug" to an active protease inhibitor) ultimately enters the flea midgut. An animal is preferably treated in such a way that the compound or product thereof enters the blood stream of the animal. Fleas are then exposed to the compound when they feed from the animal.

WO 97/40058 PCT/US97/06121 For example, flea protease inhibitors administered to an animal are administered in such a way that the inhibitors enter the blood stream of the animal, where they can be taken up by feeding fleas. In another embodiment, when a host animal is administered a flea protease vaccine, the treated animal mounts an immune response resulting in the production of antibodies against the protease (anti-flea protease antibodies) which circulate in the animal's blood stream and are taken up by fleas upon feeding. Blood taken up by fleas enters the flea midgut where compounds of the present invention, or products thereof, such as anti-flea protease antibodies, flea protease inhibitors, and/or protease synthesis suppressors, interact with, and reduce proteolytic activity in the flea midgut. The present invention also includes the ability to reduce larval flea infestation in that when fleas feed from a host animal that has been administered a therapeutic composition of the present invention, at least a portion of compounds of the present invention, or products thereof, in the blood taken up by the flea are excreted by the flea in feces, which is subsequently ingested by flea larvae. It is of note that flea larvae obtain most, if not all, of their nutrition from flea feces.

In accordance with the present invention, reducing proteolytic activity in flea midguts can lead to a number of outcomes that reduce flea burden on treated animals and WO 97/40058 PCT/US97/0612-1 16 their surrounding environments. Such outcomes include, but are not limited to, reducing the viability of fleas that feed from the treated animal, reducing the fecundity of female fleas that feed from the treated animal, reducing the reproductive capacity of male fleas that feed from the treated animal, reducing the viability of eggs laid by female fleas that feed from the treated animal, altering the blood feeding behavior of fleas that feed from the treated animal fleas take up less volume per feeding or feed less frequently), (f) reducing the viability of flea larvae, for example due to the feeding of larvae from feces of fleas that feed from the treated animal and/or altering the development of flea larvae by decreasing feeding behavior, inhibiting growth, inhibiting slowing or blocking) molting, and/or otherwise inhibiting maturation to adults) One embodiment of the present invention is a composition that includes one or more compounds that reduce the activity of one or more flea proteases directly an anti-flea protease antibody or a flea protease inhibitor) and/or indirectly a flea protease vaccine). Suitable flea proteases to target include flea aminopeptidases, flea carboxypeptidases and/or flea endopeptidases. Such proteases can include cytosolic and/or membrane-bound forms of a protease. Preferred flea WO 97/40058 PCT/US97/06121 17 proteases to target include, but are not limited to, serine proteases, metalloproteases, aspartic acid proteases and/or cysteine proteases. It is to be noted that these preferred groups of proteases include aminopeptidases, carboxypeptidases and/or endopeptidases. Preferred flea proteases to target include, but are not limited to, proteases that degrade hemoglobin, proteases involved in blood coagulation and/or lytic (anti-coagulation) pathways, proteases involved in the maturation of peptide hormones, proteases that inhibit complement or other host immune response elements antibodies) and/or proteases involved in vitellogenesis. A number of proteases are known to those skilled in the art, including, but not limited to, aminopeptidases, such as leucine aminopeptidase and aminopeptidases B and M; astacin-like metalloproteases; calpains; carboxypeptidases, such as carboxypeptidases A, P and Y; cathepsins, such as cathepsins B, D, E, G, H, and L, chymotrypsins; cruzipains; meprins; papains; pepsins; renins; thermolysins and trypsins. A particularly preferred protease to target is a protease having a proteolytic activity that, when targeted with a composition of the present invention, reduces flea burden without substantially harming the host animal. Such a protease can be identified using, for example, methods as disclosed herein.

WO 97/40058 PCT/US97/06121 18 One aspect of the present invention is the discovery that a substantial amount of the proteolytic activity found in flea midguts is serine protease activity. Both in vitro and in vivo studies using a number of protease inhibitors substantiate this discovery, details of which are disclosed in the Examples. As such a particularly preferred protease to target is a serine protease. Examples of serine proteases, include, but are not limited to, acrosins, bromelains, cathepsin G, chymotrypsins, collagenases, elastases, factor Xa, ficins, kallikreins, papains, plasmins, Staphylococcal V8 proteases, thrombins and trypsins. In one embodiment, a preferred flea serine protease to target includes a protease having trypsin-like or chymotrypsin-like activity. It is appreciated by those skilled in the art that an enzyme having "like" proteolytic activity has similar activity to the referenced protease, although the exact structure of the preferred substrate cleaved may differ. "Like" proteases usually have similar tertiary structures as their referenced counterparts.

Protease inhibitor studies disclosed in the Examples section also indicate that additional preferred proteases to target include aminopeptidases and/or metalloproteases.

Examples of such proteases include exo- and endometalloproteases, digestive enzymes, and enzymes involved in peptide hormone maturation. One example of an WO 97/40058 PCT/US97/06121 19 aminopeptidase that is also a metalloprotease is leucine aminopepridase.

Suitable compounds to include in compositions of the present invention include, but are not limited to, a vaccine comprising a flea protease (a flea protease vaccine), an antibody that selectively binds to a flea protease (an anti-flea protease antibody), a flea protease inhibitor (a compound other than a vaccine or an antibody that inhibits a flea protease), and a mixture of such compounds. As used herein, a mixture thereof refers to a combination of one or more of the cited entities.

Compositions of the present invention can also include compounds to suppress protease synthesis or maturation, such as, but not limited to, protease modulating peptides.

A preferred embodiment of the present invention is a flea protease vaccine and its use to reduce the flea population on and around an animal. A flea protease vaccine can include one or more proteins capable of eliciting an immune response against a flea protease and can also include other components. Preferred flea protease vaccines include a flea serine protease, a flea metalloprotease, a flea aspartic acid protease and/or a flea cysteine protease, with flea serine protease, flea metalloprotease and/or flea aminopeptidase vaccines being more preferred. Examples of flea protease vaccines include soluble flea midgut preparations of the present invention WO 97/40058 PCT/US97/06121 as well as one or more isolated proteins of the present invention.

One embodiment of the present invention is a soluble flea midgut preparation. Such a preparation includes primarily components naturally present in the lumen of a flea midgut and, depending on the method of preparation, can also include one or more peripheral midgut membrane proteins. Methods to preferentially include, or exclude, membrane proteins from such a preparation are known to those skilled in the art. The present invention includes the discovery that such a preparation has proteolytic activity, of which a substantial portion is serine protease activity. Preferably at least about 70 percent of the proteolytic activity in a soluble flea midgut soluble preparation is serine protease activity, as can be indicated by the ability to inhibit at least about percent of the proteolytic activity with 4-2-aminoethylbenzenesulfonylfluoride-hydrochloride (AEBSF). Serine protease activity can also be identified using other known inhibitors or substrates. Other preferred inhibitors that can inhibit at least about 70 percent of the proteolytic activity of a soluble flea midgut preparation of the present invention include soybean trypsin inhibitor, 1,3diisopropylfluoro-phosphate or leupeptin.

A soluble flea midgut preparation of the present invention includes proteases that range in molecular weight WO 97/40058 PCT/US97/06121 21 from about 5 kilodaltons (kD or kDa) to about 200 kD, as determined by SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis), with at least a substantial portion of the serine proteases ranging in molecular weight from about 5 kD to about 60 kD, as determined by SDS-PAGE. A substantial portion of protease activity in a soluble flea midgut preparation of the present invention has a pH activity optimum ranging from about pH 5 to about pH 10, preferably an activity optimum ranging from about pH 7 to about pH 9, and even more preferably an activity optimum of about pH 8. While not being bound by theory, such a pH optimum suggests that a large proportion of proteases in soluble flea midgut preparations of the present invention are serine proteases.

It is also interesting to note that the pH of the flea midgut is also about pH 8. The findings that proteases in soluble flea midgut preparations of the present invention exhibit a varied pattern of inhibition by protease inhibitors of a given type serine protease inhibitors), as well as variances seen in molecular weights and pH optima of the proteases, suggest that there are a number of protease isoforms in such preparations.

A soluble flea midgut preparation of the present invention is preferably prepared by a method that includes the steps of disrupting a flea midgut to produce a mixture including a liquid portion and a solid portion and WO 97/40058 PCT/US97/06121 22 recovering the liquid portion to obtain a soluble flea midgut preparation. Such a method is a simplified version of methods disclosed in U.S. Patent No. 5,356,622, ibid.

It is to be noted that in accordance with the present invention, methods disclosed in U.S. Patent No. 5,356,622, ibid. can also be used to prepare soluble flea midgut preparations having similar proteolytic activities.

Flea midguts can be obtained dissected from) from unfed fleas or from fleas that recently consumed a blood meal blood-fed fleas). Such midguts are referred to herein as, respectively, unfed flea midguts and fed flea midguts. Flea midguts can be obtained from either male or female fleas. As demonstrated in the Examples section, female flea midguts exhibit somewhat more proteolytic activity than do male flea midguts.

Furthermore, fed flea midguts have significantly more proteolytic activity than do unfed flea midguts. While not being bound by theory, it is believed that blood feeding induces in flea midguts the synthesis and/or activation of proteases as well as other factors enzymes, other proteins, co-factors, etc.) important in digesting the blood meal, as well as in neutralizing host molecules potentially damaging to the flea complement, immunoglobulins, blood coagulation factors). It is also to be appreciated that unfed flea midguts may contain significant targets not found in fed flea midguts and vice WO 97/40058 PCTI/US97/06121 23 versa. Furthermore, although the present application focuses primarily on flea midgut proteases, it is to be noted that the present invention also includes other components of soluble flea midgut preparations of the present invention that provide suitable targets to reduce flea burden on an animal and in the environment of that animal; see also U.S. Patent No. 5,356,622, ibid.

Methods to disrupt flea midguts in order to obtain a soluble flea midgut preparation are known to those skilled in the art and can be selected according to, for example, the volume being processed and the buffers being used.

Such methods include any technique that promotes cell lysis, such as, but are not limited to, chemical disruption techniques exposure of midguts to a detergent) as well as mechanical disruption techniques homogenization, sonication, use of a tissue blender or glass beads, and freeze/thaw techniques).

Methods to recover a soluble flea midgut preparation are also known to those skilled in the art and can include any method by which the liquid portion of disrupted flea midguts is separated from the solid portion filtration or centrifugation). In a preferred embodiment, disrupted flea midguts are subjected to centrifugation, preferably at an acceleration ranging from about 10,000 x g to about 15,000 x g for several minutes from about 1 min. to about 15 min.). The supernatant from such a WO 97/40058 PCT/US97/06121 24 centrifugation comprises a soluble flea midgut preparation of the present invention.

The present invention also includes an isolated protein that includes an amino acid sequence encoded by a nucleic acid molecule capable of hybridizing under stringent conditions that hybridize under stringent hybridization conditions) with a nucleic acid molecule that encodes a protease present the nucleic acid molecules hybridize with the nucleic acid strand that is complementary to the coding strand) in can be found in) a flea midgut, such as a midgut from a blood-fed female flea, a midgut from a blood-fed male flea, a midgut from an unfed female flea or a midgut from an unfed male flea. A preferred midgut protease is present in the lumen of the midgut.

An isolated protein of the present invention, also referred to herein as an isolated protease protein, preferably is capable of eliciting an immune response against a flea midgut protease and/or has proteolytic activity. According to the present invention, an isolated, or biologically pure, protein, is a protein that has been removed from its natural milieu. As such, "isolated" and "biologically pure" do not necessarily reflect the extent to which the protein has been purified. An isolated protease protein can be obtained from its natural source.

W097/40058 PCT/US97/06121 Such an isolated protein can also be produced using recombinant DNA technology or chemical synthesis.

As used herein, an isolated protein of the present invention can be a full-length protein or any homologue of such a protein, such as a protein in which amino acids have been deleted a truncated version of the protein, such as a peptide), inserted, inverted, substituted and/or derivatized by glycosylation, phosphorylation, acetylation, myristoylation, prenylation, palmitoylation, amidation and/or addition of glycerophosphatidyl inositol) such that the homologue comprises a protein having an amino acid sequence that is sufficiently similar to a natural flea midgut protease that a nucleic acid sequence encoding the homologue is capable of hybridizing under stringent conditions to with) the complement of a nucleic acid sequence encoding the corresponding natural flea midgut protease amino acid sequence. As used herein, stringent hybridization conditions refer to standard hybridization conditions under which nucleic acid molecules, including oligonucleotides, are used to identify similar nucleic acid molecules. Such standard conditions are disclosed, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Labs Press, 1989; Sambrook et al., ibid., is incorporated by reference herein in its entirety. Stringent hybridization conditions typically permit isolation of nucleic acid molecules having WO 97/40058 PCT/US97/06121 26 at least about 70% nucleic acid sequence identity with the nucleic acid molecule being used to probe in the hybridization reaction. Formulae to calculate the appropriate hybridization and wash conditions to achieve hybridization permitting 30% or less mismatch of nucleotides are disclosed, for example, in Meinkoth et al., 1984, Anal. Biochem. 138, 267-284; Meinkoth et al., ibid., is incorporated by reference herein in its entirety.

The minimal size of a protein homologue of the present invention is a size sufficient to be encoded by a nucleic acid molecule capable of forming a stable hybrid with the complementary sequence of a nucleic acid molecule encoding the corresponding natural protein. As such, the size of the nucleic acid molecule encoding such a protein homologue is dependent on nucleic acid composition and percent homology between the nucleic acid molecule and complementary sequence as well as upon hybridization conditions per se temperature, salt concentration, and formamide concentration). The minimal size of such nucleic acid molecules is typically at least about 12 to about 15 nucleotides in length if the nucleic acid molecules are GC-rich and at least about 15 to about 17 bases in length if they are AT-rich. As such, the minimal size of a nucleic acid molecule used to encode a Drotease protein homologue of the present invention is from about 12 to about 18 nucleotides in length. There is no limit, WO 97/40058 PCT/US97/06121 27 other than a practical limit, on the maximal size of such a nucleic acid molecule in that the nucleic acid molecule can include a portion of a gene, an entire gene, or multiple genes, or portions thereof. Similarly, the minimal size of a protease protein homologue of the present invention is from about 4 to about 6 amino acids in length, with preferred sizes depending on whether a full-length, multivalent fusion protein having more than one domain each of which has a function), or functional portions of such proteins are desired. Protease protein homologues of the present invention preferably have protease activity and/or are capable of eliciting an immune response against a flea midgut protease.

A protease protein homologue of the present invention can be the result of allelic variation of a natural gene encoding a flea protease. A natural gene refers to the form of the gene found most often in nature. Protease protein homologues can be produced using techniques known in the art including, but not limited to, direct modifications to a gene encoding a protein using, for example, classic or recombinant DNA techniques to effect random or targeted mutagenesis. Isolated protease proteins of the present invention, including homologues, can be identified in a straight-forward manner by the proteins' ability to effect proteolytic activity and/or to elicit an WO 97/40058 PCT/US97/06121 28 immune response against a flea midgut protease. Such techniques are known to those skilled in the art.

A preferred protease protein of the present invention is a flea serine protease, a flea metalloprotease, a flea aspartic acid protease, a flea cysteine protease, or a homologue of any of these proteases. A more preferred protease protein is a flea serine protease, a flea metalloprotease or a homologue of either. Also preferred is a flea aminopeptidase or a homologue thereof. Also preferred is a flea cysteine protease or a homologue thereof. Particularly preferred is a flea serine protease or a homologue thereof.

Preferred protease proteins of the present invention are flea protease proteins having molecular weights ranging from about 5 kD to about 200 kD, as determined by SDS-PAGE, and homologues of such proteins. More preferred are flea protease proteins having molecular weights ranging from about 5 kD to about 60 kD, as determined by SDS-PAGE, and homologues of such proteins. Even more preferred are flea serine protease proteins, particularly those having molecular weights of about 26 kD (denoted PfSP26, now denoted PafSP-26K to distinguish from flea PfSP26 as described in Example 26), about 24 kD (denoted PfSP24, now denoted PafSP-24K to distinguish from flea PfSP24 as described in Example 27), about 19 kD (denoted PfSP19, now denoted PafSP-19K to distinguish from flea PfSP19 as WO 97/40058 PCT/US97/06121 29 described in Example 32), about 6 kD (denoted PfSP6, now denoted PafSP-6K to distinguish from flea PfSP6 as described in Example 11), about 31 kD (denoted PfSP28), about 25 kD (denoted PlfSP-25K1) from 1st instar larvae, about 25 kD (denoted PlfSP-25K3) from 3rd instar larvae, about 28 kD (denoted PlfSP-28K3) and about 31 kD (denoted PlfSP-31K3), and flea aminopeptidase proteins, particularly those having molecular weights of about 95 kD (denoted as determined by SDS-PAGE, and homologues of such proteins.

One preferred embodiment of the present invention is an isolated flea protease protein that includes an amino acid sequence encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a flea serine protease gene, with a flea aminopeptidase gene or with a flea cysteine protease gene. As used herein, a flea protease gene includes all nucleic acid sequences related to a natural flea protease gene such as regulatory regions that control production of a flea protease protein encoded by that gene (such as, but not limited to, transcription, translation or post-translation control regions) as well as the coding region itself.

The inventors have discovered an extensive family of serine proteases, encoded by a family of serine protease genes. Such a gene family may be due to allelic variants genes having similar, but different, sequences at a WO 97/40058 WO 9740058PCT/UJS97/06121 given locus in a population of fleas) and/or to, the existence of serine protease genes at more than one locus in the flea genome. As such, the present invention includes flea serine protease genes comprising not only the nucleic acid sequences disclosed herein genes including nucleic acid sequences SEQ ID NO:9, SEQ ID NO:1l, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO:18, SEQ ID NO:20, SEQ I NO: 2 5, NO: 31, NO: 37, NO: 4 3, NO: 154, NO: 12 1, NO: 125, NO: 156, NO: 158, NO: 14 6,

SEQ

NO: 2 6, NO: 32, NO: 3 9, NO: 45, NO: 116, NO: 131, NO: 118, NO: 1 60, NO: 132, NO: 148,

SEQ

SEQ I

SEQ

ID

II

*D NO: 21, SEC NO: 28, SEQ NO: 34, SEQ NO: 40, SEQ NO:'120, SEQ NO: 117, SEQ NO: 155, SEQ NO: 128, SEQ NO: 136, SEC )NO: 134, SEC )NO: 150, SEC ID NO:23, ID NO:29, ID NO: 35, ID NO: 42, ID NO: 130, ID NO: 127, ID NO: 114, ID NO: 152, SID NO: 78, SID NO: 66, SID NO: 80,

SEQ

NO: 82, SEQ ID NO: 142, SEQ ID NO: 138, SEQ ID NO: 144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84 and/or nucleic acid sequences encoding proteins having amino acid sequences as disclosed herein SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO: 27, SEQ ID NO: 30, SEQ ID NO: 33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID WO 97/40058 PCT/US97/0612-1 31 NO:72, SEQ ID NO:73, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, and/or SEQ ID NO:96, but also allelic variants of any of those nucleic acid sequences, as well as other nucleic acid molecules and amino acid sequences disclosed in the examples section. (It should be noted that since nucleic acid sequencing technology is not entirely error-free, all sequences represented herein are at best apparent deduced) nucleic acid or amino acid sequences.) A preferred flea aminopeptidase gene includes nucleic acid sequence SEQ ID NO:110 and/or SEQ ID NO:112, which encode aminopeptidase proteins having amino acid sequences including SEQ ID NO:107, SEQ ID NO:111 and/or SEQ ID NO:113. Additional preferred aminopeptidase genes include allelic variants of SEQ ID NO:110 and/or SEQ ID NO:112.

A preferred flea cysteine protease gene includes nucleic acid sequence SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:76 and/or SEQ ID NO:94, which encode a cysteine protease protein having WO 97/40058 PCT/US97/06121 32 amino acid sequences including SEQ ID NO: 2, SEQ ID SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92, SEQ ID NO:77, and/or SEQ ID NO:95. Additional preferred cysteine protease genes include allelic variants of SEQ ID NO:1, SEQ ID NO:3, SEQ ,ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:76 and/or SEQ ID NO:94.

A preferred flea serine protease protein of the present invention is encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfSP3, nfSP8, nfSP9, nfSP10, nfSPll, nfSP19, nfSP21, nfSP23, nfSP25, nfSP26, nfSP27, nfSP29, nfSP31, nfSP34, nfSP36, nfSP37, nfSP38, nfSP39, nfSP18, nfSP24, nfSP28, nfSP32, nfSP33 and nfSP40. As used herein, each of these nucleic acid molecules represent the entire coding region of a flea serine protease gene of the present invention (at least portions of which are also referred to by flea clone numbers, as described in the Examples).

Nucleic acid molecules that contain partial coding regions or other parts of the corresponding gene are denoted by names that include the size of those nucleic acid molecules nfSP40 42 Nucleic acid molecules containing apparent full length coding regions for which the size is known also are denoted by names that include the size of those nucleic acid molecules nfSP40 41 The WO-97/40058 WO 9740058PCTUS97/06121 33 production, and at least partial nucleic acid sequence, of such nucleic acid molecules is disclosed in the Examples.

Particularly preferred serine protease proteins are encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfSP18 5 3 4 nf SP1 8 7 -151 nfSP18 2 2S, nfSP24 410 nfSP24 1 089 nfSP24-7 74 nfSP24 7 nfSP28, 2 3 nfSP32 9 1 3 3, nfSP32 9 3 nfP3 9 4 ,nSP242knfP3 7 8 nfSP292, nSP3699' 426fSPP3733, nfSP33 1 C1 9 4, nfSP331 2 00 nfSP33 7 26 nfSP4 0841, nfSP5 8 3 06 nfSPll 3 0 1, nf SP8 51 nffSP8 4 3 6 nfSP12 7 .1 8 nfSP26 6 10 nfSP27 38 6, nfSP23 4 2 3 nfSP34 3 90 nfSP36, 97 nfSP38 34 1, nfSP37 2 6 1 nfSP39 2 6, nfSP29612, nfSP3 0641, nfSP2 199,g protease sequences: nfSP31 62 nfSP32 433 nfSP15 8 1 5 nfSP19 8 5 nfSP25 8 64 and/or nf-SP40, 1 7 Even more preferred serine proteins include the following amino acid SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, NO: 30, NO: 4 1, NO: 69, NO: 73, NO: 115, NO: 153, NO: 7 9, NO: 67, NO: 81, SEQ ID NO: 33, SEQ ID NO:44, SEQ ID NO: 70, SEQ I D NO: 89, SEQ ID NO:126, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:147, SEQ ID NO:83,

SEQ

NO: 3 6, NO: 67, NO: 7 1, NO: 92, NO: 119, NO: 161, NO: 133, NO: 14 9,

SEQ

NO: 38, NO: 68, NO: 72, NO: 95, NO: 12 9, NO: 137, NO: 135, NO: 151, NO: 139, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:143, NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID WO 97/40058 PCT/US97/06121 34 NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, and/or SEQ ID NO:96. Additional particularly preferred serine protease proteins are encoded by allelic variants of nucleic acid molecules encoding proteins that include the cited amino acid sequences. Also preferred are flea serine protease proteins including regions that have at least about 50%, preferably at least about 75%, and more preferably at least about 90% identity with flea serine protease proteins having amino acid sequences as cited herein.

One embodiment of the present invention is a flea serine protease that degrades immunoglobulin circulating in a host animal flea immunoglobulin proteinase or IgGase). An example of a flea immunoglobulin proteinase is presented in the Examples section. Preferably, an immunoglobulin proteinase of the present inventioncleaves an immunoglobulin when the protein is incubated in the presence of the immunoglobulin in about 100 microliters of about 0.2M Tris-HCl for about 18 hours at about 37 0 C. More preferably, an immunoglobulin proteinase of the present invention cleaves an immunoglobulin in about 300 microliters of 50 mM Tris-HCl, pH 8.0, for about 1 hour at about 37 0 C. Suitable immunoglobulin proteinase proteins of the present invention are capable of cleaving the hinge region of an immunoglobulin heavy chain. The hinge region of an immunoglobulin is the flexible domain that joins the WO 97/40058 PCT/US97/06121 Fab arms of the immunoglobulin to the Fc portion of the molecule. A more preferred immunoglobulin proteinase protein includes a protein having a molecular weight ranging from about 25 kD to about 35 kD and more preferably having a molecular weight of about 31 kD, in its mature form. An even more preferred immunoglobulin proteinase protein includes a protein comprising an amino acid sequence including SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and/or SEQ ID NO:96, which can be encoded by a gene comprising nucleic acid sequence SEQ ID NO:66. Without being bound by theory, the proteinase activity of an immunoglobulin proteinase of the present invention cleaves an immunoglobulin in such a manner that the immunoglobulin maintains intact heavy and light chain pairs, either as two Fab fragments or one F(ab') 2 fragment. As used herein, a Fab fragment refers to complete immunoglobulin light chains paired with the variable region and CH1 domains of an immunoglobulin heavy chain. As used herein, a F(ab') 2 fragment refers to two Fab fragments that remain linked by a disulfide bond. Both Fab and F(ab') 2 fragments are capable of binding antigen.

A preferred immunoglobulin proteinase protein of the present invention is capable of cleaving the hinge region of an immunoglobulin heavy chain at a site comprising an amino acid sequence including WO 97/40058 PCT/US97/06121 36 S-I-F-I-F-P-P-K-P-K-D (SEQ ID NO:104), D-C-P-K-C-P-P-P-E-M- L-G-G-P-S-I-F-I-F-P-P-K-P-K-D-D-L-L-I-K-R-K-S-E-V (SEQ ID NO:105) and/or D-C-P-K-C-P-P-P-E-M-L-G-G-P-S-I-F-I-F-P-P-K- P-K-D-T-L-S-I-S-R-T-P-E-V (SEQ ID NO:106).

A preferred flea aminopeptidase protein of the present invention is encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfAP and/or nfAP2 (flea aminopeptidase full-length coding regions of a flea aminopeptidase gene of the present invention). Particularly preferred aminopeptidase proteins are encoded by a nucleic acid molcule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfAP 4 53 nfAPg 00 nfAP7 32 nfAP 580 nfAP2 38 3 and/or nfAP2 53 7. More preferred is an aminopeptidase protein encoded by a nucleic acid molcule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfAP2 383 and/or nfAP2s37. Even more preferred is an aminopeptidase protein that includes amino acid sequence SEQ ID NO:107, SEQ ID NO:ll and/or SEQ ID NO:113, or an aminopeptidase protein encoded by an allelic variant of a nucleic acid molecule that includes SEQ ID NO:110 and/or SEQ ID NO:112. Also preferred are flea aminopeptidase proteins including regions that have at least about prefereably at least about 75%, and more preferably at WO 97/40058 PCT/US97/06121 37 least about 90% identity with flea aminopeptidase proteins having amino acid sequences as cited herein.

A preferred flea cysteine protease protein of the present invention is encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with nucleic acid molecule nfCPl (a flea cysteine protease full-length coding region that includes nfCP1 57 or nfCP1 0 (the production of which are described in the.Examples) Even more preferred is a cysteine protease that includes amino acid sequence SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92, SEQ ID NO:95, SEQ ID NO:77, or a cysteine protease encoded by an allelic variant of a nucleic acid molecule that includes SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:76, or SEQ ID NO:94. Also preferred is a flea cysteine protease protein including regions that have at least about preferably at least about 75%, and more preferably at least about 90% identity with SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92, SEQ ID NO:77, or SEQ ID One embodiment of the present invention is an isolated protein having proteolytic activity that is substantially inhibited by a serine protease inhibitor, an aminopeptidase inhibitor and/or a cysteine protease inhibitor. Such inhibition can be measured by techniques known to those WO 97/40058 PCT/US97/06121 38 skilled in the art. To be substantially inhibited means, for example, for a serine protease, that at least half of the proteolytic activity of the protease protein is inhibited by a serine protease inhibitor. Preferably at least about 70 percent, and even more preferably at least about 90 percent of the proteolytic activity of the protease protein is inhibited by a serine protease inhibitor. Preferred serine protease inhibitors include flea serpin proteins, and peptides or analogs thereof.

An isolated protein of the present invention can be produced in a variety of ways, including recovering such a protein from a flea midgut and producing such a protein recombinantly. In one embodiment, a flea midgut protease can be recovered by methods heretofore disclosed for obtaining a soluble flea midgut preparation. A flea midgut protease protein can be further purified from a disrupted flea midgut by a number of techniques known to those skilled in the art, including, but not limited to, affinity chromatography, ion exchange chromatography, filtration, electrophoresis standard, capillary and flow-through electrophoresis), hydrophobic interaction chromatography, gel filtration chromatography, reverse phase chromatography, concanavalin A chromatography, chromatofocusing and differential solubilization. In one embodiment, a flea midgut protease is purified using WO 97/40058 PCT/US97/06121 39 protease inhibitor affinity chromatography, an example of which is disclosed in the Examples section.

Another embodiment of the present invention is a method to produce an isolated protein of the present invention using recombinant DNA technology. Such a method includes the steps of culturing a recombinant cell comprising a nucleic acid molecule encoding a protein of the present invention to produce the protein and (b) recovering the protein therefrom. Details on producing recombinant cells and culturing thereof are presented below. The phrase "recovering the protein" refers simply to collecting the whole fermentation medium containing the protein and need not imply additional steps of separation or purification. Proteins of the present invention can be purified using a variety of standard protein purification techniques, as heretofore disclosed.

Isolated proteins of the present invention are preferably retrieved in "substantially pure" form. As used herein, "substantially pure" refers to a purity that allows for the effective use of the protein as a vaccine. A vaccine for animals, for example, should exhibit no substantial toxicity and should be capable of stimulating the production of antibodies in a vaccinated animal.

Another embodiment of the present invention is an isolated nucleic acid molecule capable of hybridizing under stringent conditions with a gene encoding a flea protease WO 97/40058 PCT/US97/0612-1 present in a flea midgut. Such a nucleic acid molecule is also referred to herein as a flea protease nucleic acid molecule. Particularly preferred is an isolated nucleic acid molecule that hybridizes under stringent conditions with a flea serine protease gene, with a flea aminopeptidase gene or with a flea cysteine protease gene.

The characteristics of such genes are disclosed herein. In accordance with the present invention, an isolated nucleic acid molecule is a nucleic acid molecule that has been removed from its natural milieu that has been subject to human manipulation). As such, "isolated" does not reflect the extent to which the nucleic acid molecule has been purified. -An isolated nucleic acid molecule can include DNA, RNA, or derivatives of either DNA or RNA.

As stated above, a flea protease gene includes all nucleic acid sequences related to a natural flea protease gene such as regulatory regions that control production of a flea protease protein encoded by that gene (such as, but not limited to, transcription, translation or posttranslation control regions) as well as the coding region itself. A nucleic acid molecule of the present invention can be an isolated natural flea protease nucleic acid molecule or a homologue thereof. A nucleic acid molecule of the present invention can include one or more regulatory regions, full-length or partial coding regions, or combinations thereof. The minimal size of a flea protease WO 97/40058 PCTIUS97/06121 41 nucleic acid molecule of the present invention is the minimal size capable of forming a stable hybrid under stringent hybridization conditions with a corresponding natural gene. Flea protease nucleic acid molecules can also include a nucleic acid molecule encoding a hybrid protein, a fusion protein, a multivalent protein or a truncation fragment.

An isolated nucleic acid molecule of the present invention can be obtained from its natural source either as an entire complete) gene or a portion thereof capable of forming a stable hybrid with that gene. As used herein, the phrase "at least a portion of" an entity refers to an amount of the entity that is at least sufficient to have the functional aspects of that entity. For example, at least a portion of a nucleic acid sequence, as used herein, is an amount of a nucleic acid sequence capable of forming a stable hybrid with the corresponding gene under stringent hybridization conditions.

An isolated nucleic acid molecule of the present invention can also be produced using recombinant DNA technology polymerase chain reaction (PCR) amplification, cloning) or chemical synthesis. Isolated flea protease nucleic acid molecules include natural nucleic acid molecules and homologues thereof, including, but not limited to, natural allelic variants and modified nucleic acid molecules in which nucleotides have been WO 97/40058 PCT/US97/06121 42 inserted, deleted, substituted, and/or inverted in such a manner that such modifications do not substantially interfere with the nucleic acid molecule's ability to encode a flea protease protein of the present invention or to form stable hybrids under stringent conditions with natural nucleic acid molecule isolates.

A flea protease nucleic acid molecule homologue can be produced using a number of methods known to those skilled in the art (see, for example, Sambrook et al., ibid.). For example, nucleic acid molecules can be modified using a variety of techniques including, but not limited to, classic mutagenesis techniques and recombinant DNA techniques, such as site-directed mutagenesis, chemical treatment of a nucleic acid molecule to induce mutations, restriction enzyme cleavage of a nucleic acid fragment, ligation of nucleic acid fragments, polymerase chain reaction (PCR) amplification and/or mutagenesis of selected regions of a nucleic acid sequence, synthesis of oligonucleotide mixtures and ligation of mixture groups to "build" a mixture of nucleic acid molecules and combinations thereof. Nucleic acid molecule homologues can be selected from a mixture of modified nucleic acids by screening for the function of the protein encoded by the nucleic acid the ability of a homologue to elicit an immune response against a flea protease and/or to have WO 97/40058 PCTIUS97/06121 43 proteolytic activity) and/or by hybridization with isolated flea protease nucleic acids under stringent conditions.

An isolated flea protease nucleic acid molecule of the present invention can include a nucleic acid sequence that encodes at least one flea protease protein of the present invention, examples of such proteins being disclosed herein. Although the phrase "nucleic acid molecule" primarily refers to the physical nucleic acid molecule and the phrase "nucleic acid sequence" primarily refers to the sequence of nucleotides on the nucleic acid molecule, the two phrases can be used interchangeably, especially with respect to a nucleic acid molecule, or a nucleic acid sequence, being capable of encoding an flea protease protein.

One embodiment of the present invention is a flea protease nucleic acid molecule of the present invention that is capable of hybridizing under stringent conditions to a nucleic acid strand that encodes at least a portion of a flea protease or a homologue thereof or to the complement of such a nucleic acid strand. A nucleic acid sequence complement of any nucleic acid sequence of the present invention refers to the nucleic acid sequence of the nucleic acid strand that is complementary to can form a complete double helix with) the strand for which the sequence is cited. It is to be noted that a doublestranded nucleic acid molecule of the present invention for WO 97/40058 PCT/US97/06121 44 wnich a nucleic acid sequence has been determined for one strand, that is represented by a SEQ ID NO, also comprises a complementary strand having a sequence that is a complement of that SEQ ID NO. As such, nucleic acid molecules of the present invention, which can be either double-stranded or single-stranded, include those nucleic acid molecules that form stable hybrids under stringent hybridization conditions with either a given SEQ ID NO denoted herein and/or with the complement of that SEQ ID NO, which may or may not be denoted herein. Methods to deduce a complementary sequence are known to those skilled in the art. Preferred is a flea protease nucleic acid molecule that includes a nucleic acid sequence having at least about 65 percent, preferably at least about percent, more preferably at least about 85 percent, and even more preferably at least about 95 percent homology with the corresponding region(s) of the nucleic acid sequence encoding at least a portion of a flea protease protein. Particularly preferred is a flea protease nucleic acid molecule capable of encoding at least a portion of a flea protease that naturally is present in flea midguts and preferably is included in a soluble flea midgut preparation of the present invention. Examples of nucleic acid molecules of the present invention are disclosed in the Examples section.

WO 97/40058 WO 9740058PCTIUS97/06121 A preferred flea serine protease nucleic acid molecule of the present invention is a nucleic acid molecule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfSP3, nfSP8, nfSP9, nfSPlO, nfSPll, nfSPl9, nfSP2O, nfSP2l, nfSP23, nfSP25, nfSP26, nfSP27, nfSP29, nfSP3O, nfSP3l, nfSP34, nfSP36, nfSP37, nfSP38, nfSP39, nfSPl8, nfSP24, nfSP28, nfSP32, nfSP33 and/or nfSP4O. More preferred is a nucleic acid molecule that hybridizes under stringent hybridization conditions with at least one of the following nucleic acid molecules: nfSP185 34 nfSP18 77 nf SPl8 2 2 5 I nfSP24 4 10 nfSP24, 0 nfSP24.- 7 4 nfSP24, 1 nfSP28 7 1 nfSP28 9 3 nfSP329 3 3 nfSP32 924 nfSP32 699 nfSP33 426 n fS P 33 7 8 nf SP 3 3 18 9 4 nfSP33 12 0 C, nfSP33 nfSP4O 84, nfSP5 6 nfSPll 3,,7 nfSP8 nfSP8 43 nfSP12 75 8 nfSP26 610 nfSP27 3 6 nfSP23 42 3 nfSP34 3 90 nfSP36 1 97 nfSP38 3 41 nfSP37 261 nfSP39 2 67 nfSP29 6 1 nfSP30 641 nfSP31, 26 nfSP32 4 33 nfSP15 1 nfSPl9 8 55, nfSP25, 64 nfSP2 59 5 and/or nfSP4 71 as well as other specific nucleic acid molecules disclosed in the Examples section. Even more preferred are nucleic acid molecules that include nfSP3, nfSP8, nfSP9, nfSPlO, nfSPll, nfSPl9, nfSP20, nfSP2l, nfSP23, nfSP25, nfSP26, nfSP27, nfSP29, nfSP3O, nfSP3l, nfSP34, nfSP36, nfSP3"7, nfSP38, nfSP39, nfSPl8, nfSP24, nfSP28, nfSP32, nfSP33 and/or nfSP4O and even more nfSPl8 534 nfSP18 7 nfSP18 0 2 5 nfSP24 4 1 0 nfSP24 1 089 nfSP24 7 -7 4 nfSP24.

11 nfSP28 9 23 nfSP32 9 33 nfSP32 933 nfSP32 9 24 nfSP32 6 99, nfSP33 426 WO 97/40058 WO 9740058PCT[US97/06121 n -FSP33 77 nfSP33 8 9 4 nfSP33-...(, nfSP331 2 nfSP40.

4 nf SP5 8 0 nfSP11 3 fS~P 8 s5,, nfSP8 4 3 nfSPl2,-,, nfSP26 6 10 n f SP2 7, 8 6 nfSP23 4 2 nfSP34.

9 0 nfSP361 97 nfSP38 341 nfSP37 2 nfSP39 2 nfSP29 61 2, nfSP30 64 1, nfSP31 6 2 6 nfSP32 433 nfSP15 8 nfSPl9 8 nfSP2566 4 nf SP21.

9 and/or nfSP40 71 7 as well as other specific nucleic acid molecules disclosed in the Examples section.

Particularly preferred flea serine protease nucleic acid molecules include at least one of the following sequences:SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:12, NO:14, SEQ ID NO:15, NO: 2 0, NO: 2 6, NO: 32, NO: 39, NO: 12 0, NO: 117, NO: 155, NO: 128, NO: 13 6,

SEQ

ID NO:21, ID NO:28, ID NO: 34, ID NO: 40, ID NO: 130, ID NO: 127, ID NO: 114, ID NO: 152, ID NO: 78, SEQ ID NO: 17, SEQ ID NO:23, SEQ ID NO: 29, SEQ I D NO: 35, SEQ I D NO: 42, SEQ ID NO:154, SEQ ID NO:121, SEQ ID NO:125, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:146, SEQ ID NO:82, SEQ ID NO: 140,

SEQ

NO: 18, NO: 25, NO: 3 1, NO: 37, NO: 4 3, NO: 116, NO: 131, NO: 118, NO: 160, NO: 132, NO: 14 8, NO: 142, NO: 122, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:134, SEQ ID NO:66, NO: 150, NO: 138, SEQ ID NO:80, SEQ ID NO:144, NO:84, and/or SEQ ID NO:45, and complements thereof, as well as other specific nucleic acid molecules disclosed in WO 97/40058 PCT/US97/06121 47 the Examples section. Also preferred are allelic variants of such nucleic acid molecules.

A preferred flea aminopeptidase nucleic acid molecule of the present invention is a nucleic acid molecule that hybridizes under stringent hybridization conditions with nfAP and/or nfAP2. A more preferred flea aminopeptidase nucleic acid molecule of the present invention is a nucleic acid molecule that hybridizes under stringent hybridization conditions with nfAP 4 s 3 nfAPg 00 nfAP,32, nfAP 1 o 8 0 nfAP2 3 8 and/or nfAP2 3 7 More preferred is an aminopeptidase nucleic acid molecule that includes nfAP 453 nfAP 0 o, nfAP, 2 nfAP 1580 nfAP2 3 ,3 and/or nfAP2 5 7 Particularly preferred is a nucleic acid molecule that -includes nucleic acid sequence SEQ ID NO:110 and/or SEQ ID NO:112, a complement of any of such sequences, or allelic variants thereof.

A preferred flea cysteine protease nucleic acid molecule of the present invention is a nucleic acid molecule that hybridizes under stringent hybridization conditions with nfCPs1, 3 or nfCP1 1 n 0 (the production of which are described in the Examples). More preferred is a cysteine protease nucleic acid molecule that includes nfCP1s-, or nfCP 1 109 Particularly preferred is a nucleic acid molecule that includes nucleic acid sequence SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 WO 97/40058 PCTIUS97/06121 48 and/or SEQ ID NO:94, or allelic variants of such nucleic acid molecules.

Knowing a nucleic acid molecule of a flea protease protein of the present invention allows one skilled in the art to make copies of that nucleic acid molecule as well as to obtain a nucleic acid molecule including additional portions of flea protease protein-encoding genes nucleic acid molecules that include the translation start site and/or transcription and/or translation control regions), and/or flea protease nucleic acid molecule homologues. Knowing a portion of an amino acid sequence of a flea protease protein of the present invention allows one skilled in the art to clone nucleic acid sequences encoding such a flea protease protein. In addition, a desired flea protease nucleic acid molecule can be obtained in a variety of ways including screening appropriate expression libraries with antibodies which bind to flea protease proteins of the present invention; traditional cloning techniques using oligonucleotide probes of the present invention to screen appropriate libraries or DNA; and PCR amplification of appropriate libraries, or RNA or DNA using oligonucleotide primers of the present invention (genomic and/or cDNA libraries can be used) To isolate flea protease nucleic acid molecules, preferred cDNA libraries include cDNA libraries made from unfed whole fleas, fed whole fleas, fed flea midguts, unfed flea midguts, and flea WO 97/40058 PCT/US97/06121 49 salivary glands. Techniques to clone and amplify genes are disclosed, for example, in Sambrook et al., ibid. The Examples section includes examples of the isolation of cDNA sequences encoding flea protease proteins of the present invention.

The present invention also includes nucleic acid molecules that are oligonucleotides capable of hybridizing, under stringent conditions, with complementary regions of other, preferably longer, nucleic acid molecules of the present invention that encode at least a portion of a flea protease protein. Oligonucleotides of the present invention can be RNA, DNA, or derivatives of either. The minimal size of such oligonucleotides is the size required to form a stable hybrid between a given oligonucleotide and the complementary sequence on another nucleic acid molecule of the present invention. Minimal size characteristics are disclosed herein. The size of the oligonucleotide must also be sufficient for the use of the oligonucleotide in accordance with the present invention. Oligonucleotides of the present invention can be used in a variety of applications including, but not limited to, as probes to identify additional nucleic acid molecules, as primers to amplify or extend nucleic acid molecules or in therapeutic applications to inhibit flea protease production. Such therapeutic applications include the use of such oligonucleotides in, for example, antisense-, triplex WO 97/40058 PCTIUS97/06121 formation-, ribozyme- and/or RNA drug-based technologies.

The present invention, therefore, includes such oligonucleotides and methods to interfere with the production of flea protease proteins by use of one or more of such technologies.

The present invention also includes a recombinant vector, which includes a flea protease nucleic acid molecule of the present invention inserted into any vector capable of delivering the nucleic acid molecule into a host cell. Such a vector contains heterologous nucleic acid sequences, that is nucleic acid sequences that are not naturally found adjacent to flea protease nucleic acid molecules of the present invention. The vector can be either RNA or DNA, either prokaryotic or eukaryotic, and typically is a virus or a plasmid. Recombinant vectors can be used in the cloning, sequencing, and/or otherwise manipulating of flea protease nucleic acid molecules of the present invention. One type of recombinant vector, herein referred to as a recombinant molecule and described in more detail below, can be used in the expression of nucleic acid molecules of.the present invention. Preferred recombinant vectors are capable of replicating in the transformed cell.

Preferred nucleic acid molecules to include in recombinant vectors of the present invention are disclosed herein.

As heretofore disclosed, one embodiment of the present invention is a method to produce a flea protease protein of WO 97/40058 PCTILS97/06121.~ 51 the present invention by culturing a cell capable of expressing the protein under conditions effective to produce the protein, and recovering the protein. A preferred cell to culture is a recombinant cell that is capable of expressing the flea protease protein, the recombinant cell being produced by transforming a host cell with one or more nucleic acid molecules of the present invention. Transformation of a nucleic acid molecule into a cell can be accomplished by any method by which a nucleic acid molecule can be inserted into the cell.

Transformation techniques include, but are not limited to, transfection, electroporation, microinjection, lipofection, adsorption, and protoplast fusion. A recombinant cell may remain unicellular or may grow into a tissue, organ or a multicellular organism. Transformed nucleic acid molecules of the present invention can remain extrachromosomal or can integrate into one or more sites within a chromosome of the transformed recombinant) cell in such a manner that their ability to be expressed is retained. Preferred nucleic acid molecules with which to transform a host cell are disclosed herein.

Suitable host cells to transform include any cell that can be transformed and that can express the introduced flea protease protein. Such cells are, therefore, capable of producing flea protease proteins of the present invention after being transformed with at least one nucleic acid WO 97/40058 PCT/US97/06121 52 molecule of the present invention. Host cells can be either untransformed cells or cells that are already transformed with at least one nucleic acid molecule.

Suitable host cells of the present invention can include bacterial, fungal (including yeast), insect, animal and plant cells. Preferred host cells include bacterial, yeast, insect and mammalian cells, with bacterial E.

coli) and insect Spodoptera) cells being particularly preferred.

A recombinant cell is preferably produced by transforming a host cell with one or more recombinant molecules, each comprising one or more nucleic acid molecules of the present invention operatively linked to an expression vector containing one or more transcription control sequences. The phrase operatively linked refers to insertion of a nucleic acid molecule into an expression vector in a manner such that the molecule is able to be expressed when transformed into a host cell. As used herein, an expression vector is a DNA or RNA vector that is capable of transforming a host cell and of effecting expression of a specified nucleic acid molecule.

Preferably, the expression vector is also capable of replicating within the host cell. Expression vectors can be either prokaryotic or eukaryotic, and are typically viruses or plasmids. Expression vectors of the present invention include any vectors that function direct WO 97/40058 PCTUS97/0612-1 53 gene expression) in recombinant cells of the present invention, including in bacterial, fungal, insect, animal, and/or plant cells. As such, nucleic acid molecules of the present invention can be operatively linked to expression vectors containing regulatory sequences such as promoters, operators, repressors, enhancers, termination sequences, origins of replication, and other regulatory sequences that are compatible with the recombinant cell and that control the expression of nucleic acid molecules of the present invention. As used herein, a transcription control sequence includes ,a sequence which is capable of controlling the initiation, elongation, and termination of transcription. Particularly important transcription control sequences are those which control transcription initiation, such as promoter, enhancer, operator and repressor sequences. Suitable transcription control sequences include any transcription control sequence that can function in at least one of the recombinant cells of the present invention. A variety of such transcription control sequences are known to those skilled in the art.

Preferred transcription control sequences include those which function in bacterial, yeast, helminth, insect and mammalian cells, such as, but not limited to, tac, lac, trp, trc, oxy-pro, omp/lpp, rrnB, bacteriophage lambda (X) (such as XPL and Ap, and fusions that include such promoters), bacteriophage T7, T7lac, bacteriophage T3, 1 WO 97/40058 PCT/US97/06121 54 bacteriophage SP6, bacteriophage SP01, metallothionein, alpha mating factor, Pichia alcohol oxidase, alphavirus subgenomic promoters (such as Sindbis virus subgenomic promoters), baculovirus, Heliothis zea insect virus, vaccinia virus, herpesvirus, poxvirus, adenovirus, simian virus 40, retrovirus actin, retroviral long terminal repeat, Rous sarcoma virus, heat shock, phosphate and nitrate transcription control sequences as well as other sequences capable of controlling gene expression in prokaryotic or eukaryotic cells. Additional suitable transcription control sequences include tissue-specific promoters and enhancers as well as lymphokine-inducible promoters promoters inducible by interferons or interleukins). Transcription control sequences of the present invention can also include naturally occurring transcription control sequences naturally associated with a DNA sequence encoding a flea protease protein.

Expression vectors of the present invention may also contain secretory signals signal segment nucleic acid sequences) to enable an expressed flea protease protein to be secreted from the cell that produces the protein. Suitable signal segments include a flea protease protein signal segment or any heterologous signal segment capable of directing the secretion of a flea protease protein, including fusion proteins, of the present invention. Preferred signal segments include, but are not WO 97/40058 PCTUS97/06121.

limited to, flea protease, tissue plasminogen activator (t- PA), interferon, interleukin, growth hormone, histocompatibility and viral envelope glycoprotein signal segments.

Expression vectors of the present invention may also contain fusion sequences which lead to the expression of inserted nucleic acid molecules of the present invention as fusion proteins. Inclusion of a fusion sequence as part of a flea protease nucleic acid molecule of the present invention can enhance the stability during production, storage and/or use of the protein encoded by the nucleic acid molecule. Furthermore, a fusion segment can function as a tool to simplify purification of a flea protease protein, such as to enable purification of the resultant fusion protein using affinity chromatography. A suitable fusion segment can be a domain of any size that has the desired function increased stability and/or purification tool). It is within the scope of the present invention to use one or more fusion segments. Fusion segments can be joined to amino and/or carboxyl termini of a flea protease protein. Linkages between fusion segments and flea protease proteins can be constructed to be susceptible to cleavage to enable straight-forward recovery of the flea protease proteins. Fusion proteins are preferably produced by culturing a recombinant cell transformed with a fusion nucleic acid sequence that WO 97/40058 PCT/US97/06121.

56 encodes a protein including the fusion segment attached to either the carboxyl and/or amino terminal end of a flea protease protein.

A recombinant molecule of the present invention is a S molecule that can include at least one of any nucleic acid molecule heretofore described operatively linked to at least one of any transcription control sequence capable of effectively regulating expression of the nucleic acid molecule(s) in the cell to be transformed. A preferred recombinant molecule includes one or more nucleic acid molecules of the present invention, with those that encode one or more flea protease proteins, and particularly one or more flea serine protease, aminopeptidase and/or cysteine protease proteins, being more preferred. Similarly, a preferred recombinant cell includes one or more nucleic acid molecules of the present invention, with those that encode one or more flea protease proteins, and particularly one or more flea serine protease, aminopeptidase, and/or cysteine protease proteins, being more preferred.

It may be appreciated by one skilled in the art that use of recombinant DNA technologies can improve expression of transformed nucleic acid molecules by manipulating, for example, the number of copies of the nucleic acid molecules within a host cell, the efficiency with which those nucleic acid molecules are transcribed, the efficiency with which the resultant transcripts are translated, and the WO 97/40058 PCTIUS97/06121 57 efficiency of post-translational modifications.

Recombinant techniques useful for increasing the expression of nucleic acid molecules of the present invention include, but are not limited to, operatively linking nucleic acid molecules to high-copy number plasmids, integration of the nucleic acid molecules into one or more host cell chromosomes, addition of vector stability sequences to plasmids, substitutions or modifications of transcription control signals promoters, operators, enhancers), substitutions or modifications of translational control signals ribosome binding sites, Shine-Dalgarno sequences), modification of nucleic acid molecules of the present invention to correspond to the codon usage of the host cell, deletion of sequences that destabilize transcripts, and use of control signals that temporally separate recombinant cell growth from recombinant protein production during fermentation. The activity of an expressed recombinant protein of the present invention may be improved by fragmenting, modifying, or derivatizing the resultant protein.

In accordance with the present invention, recombinant cells can be used to produce flea protease proteins of the present invention by culturing such cells under conditions effective to produce such a protein, and recovering the protein. Effective conditions to produce a protein include, but are not limited to, appropriate media, WO 97/40058 PCT/US97/06121.

58 bioreactor, temperature, pH and oxygen conditions that permit protein production. An appropriate, or effective, medium refers to any medium in which a cell of the present invention, when cultured, is capable of producing a flea protease protein. Such a medium is typically an aqueous medium comprising assimilable carbohydrate, nitrogen and phosphate sources, as well as appropriate salts, minerals, metals and other nutrients, such as vitamins. The medium may comprise complex nutrients or may be a defined minimal medium.

Cells of the present invention can be cultured in conventional fermentation bioreactors, which include, but are not limited to, batch, fed-batch, cell recycle, and continuous fermentors. Culturing can also be conducted in shake flasks, test tubes, microtiter dishes, and petri plates. Culturing is carried out at a temperature, pH and oxygen content appropriate for the recombinant cell. Such culturing conditions are well within the expertise of one of ordinary skill in the art.

Depending on the vector and host system used for production, resultant flea protease proteins may either remain within the recombinant cell; be secreted into the fermentation medium; be secreted into a space between two cellular membranes, such as the periplasmic space in E.

coli; or be retained on the outer surface of a cell or WO 97/40058 PCT/US97/06121.

59 viral membrane. Methods to purify such proteins are heretofore disclosed.

The present invention also includes isolated anti-flea protease antibodies and their use to reduce flea infestation on a host animal as well as in the environment of the animal. An anti-flea protease antibody is an antibody capable of selectively binding to a protease present in a flea midgut, including female and male fed midguts as well as female and male unfed midguts. An antiflea protease antibody preferably binds to the protease in such a way as to reduce the proteolytic activity of that protease.

Isolated antibodies are antibodies that have been removed from their natural milieu. The term "isolated" does not refer to the state of purity of such antibodies.

As such, isolated antibodies can include anti-sera containing such antibodies, or antibodies that have been purified to varying degrees. As used herein, the term "selectively binds to" refers to the ability of such antibodies to preferentially bind to the protease against which the antibody was raised to be able to distinguish that protease from unrelated components in a mixture.). Binding affinities typically range from about M- to about 101 Binding can be measured using a variety of methods known to those skilled in the art including immunoblot assays, immunoprecipitation assays, WO 97/40058 PCT/US97/06121.

radioimmunoassays, enzyme immunoassays ELISA), immunofluorescent antibody assays and immunoelectron microscopy; see, for example, Sambrook et al., ibid.

Antibodies of the present invention can be either polyclonal or monoclonal antibodies. Antibodies of the present invention include functional equivalents such as antibody fragments and genetically-engineered antibodies, including single chain antibodies, that are capable of selectively binding to at least one of the epitopes of the protein used to obtain the antibodies. Antibodies of the present invention also include chimeric antibodies that can bind to more than one epitope. Preferred antibodies are raised in response to proteins that are encoded, at least in part, by a flea protease nucleic acid molecule of the present invention.

Anti-flea antibodies of the present invention include antibodies raised in an animal administered a flea protease vaccine of the present invention that exert their effect when fleas feed from the vaccinated animal's blood containing such antibodies. Anti-flea antibodies of the present invention also include antibodies raised in an animal against one or more flea protease proteins, or soluble flea midgut preparations, of the present invention that are then recovered from the animal using techniques known to those skilled in the art. Yet additional antibodies of the present invention are produced WO 97/40058 PCT/US97/0612-1 61 recombinantly using techniques as heretofore disclosed for flea protease proteins of the present invention.

Antibodies produced against defined proteins can be advantageous because such antibodies are not substantially contaminated with antibodies against other substances that might otherwise cause interference in a diagnostic assay or side effects if used in a therapeutic composition.

Anti-flea protease antibodies of the present invention have a variety of uses that are within the scope of the present invention. For example, such antibodies can be used in a composition of the present invention to passively immunize an animal in order to protect the animal from flea infestation. Anti-flea antibodies can also be used as tools to screen expression libraries and/or to recover desired proteins of the present invention from a mixture of proteins and other contaminants. Furthermore, antibodies of the present invention can be used to target cytotoxic agents to fleas in order to killfleas. Targeting can be accomplished by conjugating stably joining) such antibodies to the cytotoxic agents using techniques known to those skilled in the art.

A preferred anti-flea protease antibody of the present invention can selectively bind to, and preferentially reduce the proteolytic activity of, a flea serine protease, a flea metalloprotease, a flea aspartic acid protease and/or a flea cysteine protease. More preferred anti-flea WO 97/40058 PCT/US97/06121 62 protease antibodies include anti-flea serine protease antibodies, anti-flea metalloprotease antibodies, anti-flea aminopeptidase antibodies, and anti-flea cysteine protease antibodies. Particularly preferred are anti-flea serine protease antibodies, anti-flea aminopeptidase antibodies, and anti-flea cysteine protease antibodies, including those raised against flea serine protease proteins, flea aminopeptidase proteins or cysteine protease proteins of the present invention.

The present invention also includes the use of protease inhibitors that reduce proteolytic activity of flea proteases to reduce flea infestation of animals and the surrounding environment. As used herein, protease inhibitors are compounds that interact directly with a protease thereby inhibiting that protease's activity, usually by binding to or otherwise interacting with the protease's active site. Protease inhibitors are usually relatively small compounds and as such differ from antiprotease antibodies that interact with the active site of a protease.

Protease inhibitors can be used directly as compounds in compositions of the present invention to treat animals as long as such compounds are not harmful to the animals being treated. Protease inhibitors can also be used to identify preferred types of flea proteases to target using compositions of the present invention. For example, the WO 97/40058 PCT/US97/0612-1 63 inventors have shown herein the predominance of serine proteases in flea midguts, particularly in soluble flea midgut preparations, using protease inhibitors. Such knowledge suggests that effective reduction of flea infestation of an animal can be achieved using serine protease vaccines, anti-flea serine protease antibodies and other inhibitors of serine protease synthesis and activity that can be tolerated by the animal. For example, flea immunoglobulin proteinase activity disclosed herein can be targeted to reduce flea infestation. That other proteases are also present in flea midguts according to the present invention also suggests targeting such proteases. Methods to use protease inhibitors are known to those skilled in the art; examples of such methods are disclosed herein.

In one embodiment, a protease inhibitor that can be used in a composition of the present invention to treat an animal is identified by a method including the following steps: identifying candidate putative, possible) inhibitor compounds by testing the efficacy of one or more protease inhibitors in vitro for their ability to inhibit flea protease activity and/or (ii) in a flea feeding assay for their ability to reduce the survival and/or fecundity of fleas by adding the inhibitors to the blood meal of a flea being maintained, for example, in a feeding system, such as that described by Wade et al., 1988, J.Med Entomol. 25, 186-190; and testing the WO 97/40058 PCT/US97/06121.

64 efficacy of the candidate inhibitor compounds in animals infested with fleas. Although one does not need both in vitro assay data and flea feeding assay data to determine which candidate compounds to administer to animals, evaluation of both sets of data is preferred since data from neither of the assays necessarily predicts data to be obtained from the other assay. For example, candidate compounds identified using the in vitro assay may work "in the test tube" but may not work in vivo for a number of reasons, including the presence of interfering components in the blood meal that inhibit the activity of such compounds; although aprotinin can inhibit at least some flea serine proteases in vitro, aprotinin does not work well in the presence of serum proteins, such as are found in the blood. Furthermore, candidate inhibitor compounds identified by the flea feeding assays can include not only desired compounds but also compounds that reduce the viability and/or fecundity of fleas due to general toxicity affecting the mitochondria of fleas).

In a preferred embodiment, an inhibitor of a flea protease of the present invention is identified by a method comprising: contacting an isolated flea protease protein comprising an amino acid sequence including SEQ ID SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID WO 97/40058 WO 7/4058PCTIUJS97/06121.

NO: 44, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NQ:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO: 129, SEQ ID NO: 153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO: 2, SEQ ID NO: 5, SEQ ID NO: 8, SEQ ID NO:89, SEQ ID NO:92 and/or SEQ ID NO:95 with a putative inhibitory compound under conditions in which, in the absence of the compound, the protein has proteolytic activity; and (b) determining if the putative inhibitory compound inhibits the activity. A test kit can be used to perform such method. A preferred test kit comprises an isolated flea protease protein comprising an amino acid sequence including SEQ ID NO:1O, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID WO 97/40058 PCT/US97/06121 66 NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and/or SEQ ID NO:95, and a means for determining the extent of inhibition of the activity in the presence of a putative inhibitory compound.

In another embodiment, protease inhibitors are used in the purification of corresponding proteases by, for example, affinity chromatography, in which, a protease inhibitor is incubated with a mixture containing a desired protease under conditions that the inhibitor forms a complex with the protease. The protease can then be recovered from the complex. The protease inhibitor can be attached to a solid support and/or be labelled with, for example, a radioactive, fluorescent, or enzymatic tag that can be used to detect and/or recover the complex.

Suitable protease inhibitors to use in accordance with the present invention include serine protease inhibitors (including immunoglobulin proteinase inhibitors and serpins), metalloprotease inhibitors, aspartic acid protease inhibitors, cysteine protease inhibitors and aminopeptidase inhibitors. Preferred protease inhibitors include serine protease inhibitors, metalloprotease WO 97/40058 PCT/US97/06121 67 inhibitors, aminopeptidase inhibitors and cysteine protease inhibitors, particularly those that are broad spectrum inhibitors. More preferred are broad spectrum serine protease inhibitors.

There is a wide variety of protease inhibitors, as is known to one skilled in the art. Examples include, but are not limited to, AEBSF, aprotinin, bestatin, chloromethyl ketones TLCK (Na-p-tosyl-L-lysine chloromethyl ketone) and TPCK (N-tosyl-L-phenylalanine chloromethyl ketone), chymostatin, cystatin, 3'4-dichloroisocoumarin, E-64 (trans-epoxysuccinyl-L-leucylamido-(4-guanidino)butane), EDTA (ethylenediaminetetraacetic acid), leupeptin, methyl ketones having a variety of leaving groups, oxidized Lleucinethiol, pepstatin, 1,10-orthophenanthroline, phosphoramidon, soybean trypsin/chymotrypsin inhibitor and soybean trypsin inhibitor. Preferred protease inhibitors for use in the present invention include AEBSF, bestatin, E-64 leupeptin, pepstatin, 1,10-orthophenanthroline, phosphoramidon, TLCK and TPCK, with AEBSF (a broad spectrum serine protease inhibitor), bestatin (an inhibitor of leucine aminopeptidase) and 1,10-orthophenanthroline (a broad spectrum metalloprotease inhibitor) being particularly preferred.

Another preferred inhibitor of the present invention includes an inhibitor of an immunoglobulin proteinase of the present invention. Suitable inhibitors of WO 97/40058 PCT/US97/06121 68 immunoglobulin proteinase activity are compounds that interact directly with an immunoglobulin proteinase protein's active site, thereby inhibiting that immunoglobulin proteinase's activity, usually by binding to or otherwise interacting with or otherwise modifying the immunoglobulin proteinase's active site. Immunoglobulin proteinase inhibitors can also interact with other regions of the immunoglobulin proteinase protein to inhibit immunoglobulin proteinase activity, for example, by allosteric interaction. Inhibitors of immunoglobulin proteinases are usually relatively small compounds and as such differ from anti-immunoglobulin proteinase antibodies.

Preferably, an immunoglobulin proteinase inhibitor of the present invention is identified by its ability to bind to, or otherwise interact with, a flea immunoglobulin proteinase protein, thereby inhibiting the activity of the flea immunoglobulin proteinase.

Preferred immunoglobulin proteinase inhibitors of the present invention include, but are not limited to, flea immunoglobulin proteinase substrate analogs, and other molecules that bind to a flea immunoglobulin proteinase to an allosteric site) in such a manner that proteinase activity of the flea immunoglobulin proteinase is inhibited. An immunoglobulin proteinase substrate analog refers to a compound that interacts with binds to, associates with, modifies) the active site of an WO 97/40058 PCT/US97/06121.

69 immunoglobulin proteinase protein. A preferred immunoglobulin proteinase substrate analog inhibits immunoglobulin proteinase activity. Immunoglobulin proteinase substrate analogs can be of any inorganic or organic composition, and, as such, can be, but are not limited to, peptides, nucleic acids, and peptidomimetic compounds. Immunoglobulin proteinase substrate analogs can be, but need not be, structurally similar to an immunoglobulin proteinase's natural substrate as long as they can interact with the active site of that proteinase protein. Immunoglobulin proteinase substrate analogs can be designed using computer-generated structures of immunoglobulin proteinase proteins of the present invention or computer structures of immunoglobulin proteinases' natural substrates. Substrate analogs can also be obtained by generating random samples of molecules, such as oligonucleotides, peptides, peptidomimetic compounds, or other inorganic or organic molecules, and screening such samples by affinity chromatography techniques using the corresponding binding partner, a flea immunoglobulin proteinase). A preferred immunoglobulin proteinase substrate analog is a peptidomimetic compound a compound that is structurally and/or functionally similar to a natural substrate of an immunoglobulin proteinase of the present invention, particularly to the region of the substrate that interacts with the proteinase active site, WO 97/40058 PCT/US97/06121.

but that inhibits immunoglobulin proteinase activity upon interacting with the immunoglobulin proteinase active site).

Another preferred flea immunoglobulin proteinase inhibitors of the present invention include antibodies that bind specifically to an immunoglobulin proteinase in such a manner that the proteinase activity of the immunoglobulin proteinase is inhibited. Yet another preferred flea immunoglobulin proteinase inhibitor includes an inhibitor from the class of serine proteinase inhibitors. Suitable immunoglobulin proteinase inhibitor include serine proteinase inhibitors disclosed herein.

Protease inhibitors can be produced using methods known to those skilled in the art. Protein- or peptidebased protease inhibitors, such as cystatin or small peptides comprising a protease substrate, can be produced recombinantly and modified as necessary.

The present invention also includes the use of proteolytically active flea protease proteins of the present invention to identify additional protease inhibitors, and preferably protease inhibitor compounds that can be included in a composition of the present invention to be administered to animals. A method to identify a flea protease inhibitor includes the steps of contacting combining, mixing) an isolated flea protease protein with a putative candidate) WO 97/40058 PCT/US97/06121 71 inhibitory compound under conditions in which, in the absence of the compound, the protein has proteolytic activity, and determining if the putative inhibitory compound inhibits the proteolytic activity of the protein.

Putative inhibitory compounds to screen include organic molecules, antibodies (including functional equivalents thereof) and substrate analogs. Methods to determine protease activity are known to those skilled in the art, as heretofore disclosed. Particularly preferred for use in identifying inhibitors are flea serine protease proteins, flea aminopeptidase proteins and flea cysteine protease proteins of the present invention.

The present invention also includes inhibitors isolated by such a method, and/or test kit, and their use to inhibit any flea protease that is susceptible to such an inhibitor.

It is to be appreciated that the present invention also includes mimetopes of compounds of the present invention that can be used in accordance with methods as disclosed for compounds of the present invention. As used herein, a mimetope of a proteinaceous compound of the present invention a flea protease protein, an antiflea protease antibody, a proteinaceous inhibitor of protease activity or synthesis) refers to any compound that is able to mimic the activity of that proteinaceous compound, often because the mimetope has a structure that WO 97/40058 PCT/US97/0612-1 72 mimics the proteinaceous compound. For example, a mimetope of a flea protease protein is a compound that has an activity similar to that of an isolated flea protease protein of the present invention. Mimetopes can be, but are not limited to: peptides that have been modified to decrease their susceptibility to degradation; antiidiotypic and/or catalytic antibodies, or fragments thereof; non-proteinaceous immunogenic portions of an isolated protein carbohydrate structures); and synthetic or natural organic molecules, including nucleic acids. Such mimetopes can be designed using computergenerated structures of proteins of the present invention.

Mimetopes can also be obtained by generating random samples of molecules, such as oligonucleotides, peptides or other organic molecules, and screening such samples by affinity chromatography techniques using the corresponding binding partner.

The present invention includes therapeutic compositions, also referred to herein as compositions, that include a at least one) compound of the present invention. Preferred compounds to include in a composition of the present invention include flea protease vaccines, anti-flea protease antibodies and/or protease inhibitors as disclosed herein. Such a therapeutic composition can protect an animal from flea infestation by reducing flea WO 97/40058 PCT/US97/06121 73 protease activity, thereby reducing flea burden on the animal and in the environment of the animal.

Particularly preferred therapeutic compositions of the present invention include at least one of the following compounds: an isolated flea serine protease protein or a mimetope thereof; an isolated flea serine protease nucleic acid molecule that hybridizes under stringent hybridization conditions with a flea serine protease gene; an isolated antibody that selectively binds to a flea serine protease protein; an inhibitor of flea serine protease activity identified by its ability to inhibit flea serine protease activity; an isolated flea aminopeptidase protein or a mimetope thereof; an isolated flea aminopeptidase nucleic acid molecule that hybridizes under stringent hybridization conditions with a flea aminopeptidase gene; an isolated antibody that selectively binds to a flea aminopeptidase protein; an inhibitor of flea aminopeptidase activity identified by its ability to inhibit flea aminopeptidase activity; an isolated flea cysteine protease protein or a mimetope thereof; an isolated flea cysteine protease nucleic acid molecule that hybridizes under stringent hybridization conditions with a flea cysteine protease gene; an isolated antibody that selectively binds to a flea cysteine protease protein; and an inhibitor of flea cysteine protease activity identified by its ability to inhibit flea cysteine protease activity.

WO 97/40058 PCT/US97/06121 74 Another embodiment of the present invention is a therapeutic composition that includes a first compound that reduces flea protease activity and a second compound that reduces flea burden by a method other than by reducing flea protease activity. The present invention also includes a method to protect an animal from flea infestation by administering to the animal such a composition. The first compound of such a composition by effectively reducing flea protease activity in the midgut, enhances the activity of the second compound. While not being bound by theory, it is believed that a number of anti-flea treatments, particularly those that are proteinaceous, are not very effective because they are degraded in the flea midgut.

The present invention permits the effective use of such anti-flea treatments by reducing proteolytic degradation of such treatments by the flea midgut.

Preferred first compounds to include in such a composition include flea protease vaccines, anti-flea protease antibodies and/or protease inhibitors as disclosed herein, such compounds that target flea immunoglobulin proteinase activity.

A preferred therapeutic composition of the present invention comprises an excipient and a protective compound including: an isolated protein or mimetope thereof encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule WO 97/40058 WO 9740058PCTIIJS97/06121.

having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO: 19, SEQ ID NO: 22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NQ:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:11l, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and/or SEQ ID NO:95; an isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene comprising a nucleic acid sequence including SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO: 120, SEQ ID NO: 130, SEQ ID NO: 154, SEQ ID NO:116, SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 76 NO: 117, SEQ ID NO: 127, SEQ ID NO: 121, SEQ ID NO: 131, SEQ ID NO:155, SEQ ID NO:114, SEQ ID NO:125, SEQ ID NO: 118, SEQ ID NO:128, SEQ ID NO:152, SEQ ID NO:156, SEQ ID NO:160, SEQ ID NO:136, SEQ ID NO:78, SEQ ID NO:158, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:66, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:138, SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84, SEQ ID NO:1l0, SEQ ID NO:112, SEQ ID NO:76, SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and/or SEQ ID NO:94; an isolated antibody that selectively binds to a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 77 NO:123, SEQ ID NO:6 8, SEQ ID NO:l63, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:11l, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and/or SEQ ID NO:95; an inhibitor of protease activity identified by its ability to inhibit the activity of a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence including SEQ ID NO:l0, SEQ ID NO:13, SEQ ID NO:l6, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, NO: 27, SEQ ID NO: 30, SEQ ID NO:33, SEQ ID NO:36, NO: 38, SEQ ID NO: 41, SEQ ID NO: 44, SEQ ID NO: 67, NO: 68, SEQ ID NO: 69, SEQ ID NO:70, SEQ ID NO:71, NO: 72, SEQ ID NO: 73, SEQ ID NO:96, SEQ ID NO:115, NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:8l, NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, NO:lll, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2,

SEQ

NO: 5, SEQ ID NO: 8, SEQ ID NO: 89, SEQ ID NO: 92 and/or SEQ ID NO:95; and a mixture thereof.

WO 97/40058 PCT/US97/06121 78 Suitable second compounds include any anti-flea agent(s), including, but not limited to, proteinaceous compounds, insecticides and flea collars. Preferred second compounds are proteinaceous compounds that effect active immunization antigen vaccines), passive immunization antibodies), or that otherwise inhibit a flea activity that when inhibited can reduce flea burden on and around an animal. Examples of second compounds include a compound that inhibits binding between a flea membrane protein and its ligand a compound that inhibits flea ATPase activity or a compound that inhibits binding of a peptide or steroid hormone to its receptor), a compound that inhibits hormone (including peptide or steroid hormones) synthesis, a compound that inhibits vitellogenesis (including production of vitellin and transport and maturation thereof into a major egg yolk protein), a compound that inhibits fat body function, a compound that inhibits flea muscle action, a compound that inhibits the flea nervous system, a compound that inhibits the flea immune system and/or a compound that inhibits flea feeding.

According to the present invention, an immunoglobulin proteinase of the present invention can also be used as a second compound in a therapeutic composition of the present invention to promote longevity of antibodies that bind specifically to selected flea proteins. An immunoglobulin WO 97/40058 PCT/US97/06121 79 proteinase can be administered to an animal to promote production of antibodies that bind specifically to the immunoglobulin proteinase, thereby inhibiting the activity of the proteinase. An immunoglobulin proteinase can be administered to an animal either together with or after administration of any desired flea protein to the animal.

A preferred immunoglobulin proteinase to include as a second compound in a therapeutic composition includes: an isolated protein or a mimetope thereof encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising amino acid sequence SEQ ID NO:67, SEQ ID NO:68 and/or SEQ ID NO:69; and/or an isolated nucleic acid molecule that hybridizes under stringent conditions with a gene comprising a nucleic acid sequence including SEQ ID NO:66 and other nucleic acid sequences encoding an immunoglobulin proteinase of the present invention disclosed herein.

Compositions of the present invention can also include other components such as a pharmaceutically acceptable excipient, an adjuvant, and/or a carrier. For example, compositions of the present invention can be formulated in an excipient that the animal to be treated can tolerate.

Examples of such excipients include water, saline, Ringer's solution, dextrose solution, Hank's solution, and other aqueous physiologically balanced salt solutions.

WO 97/40058 PCT/US97/06121 Nonaqueous vehicles, such as fixed oils, sesame oil, ethyl oleate, or triglycerides may also be used. Other useful formulations include suspensions containing viscosity enhancing agents, such as sodium carboxymethylcellulose, sorbitol, or dextran. Excipients can also contain minor amounts of additives, such as substances that enhance isotonicity and chemical stability. Examples of buffers include phosphate buffer, bicarbonate buffer and Tris buffer, while examples of preservatives include thimerosal, m- or o-cresol, formalin and benzyl alcohol. Standard formulations can either be liquid injectables or solids which can be taken up in a suitable liquid as a suspension or solution for injection. Thus, in a non-liquid formulation, the excipient can comprise dextrose, human serum albumin, preservatives, etc., to which sterile water or saline can be added prior to administration.

In one embodiment of the present invention, the composition can also include an immunopotentiator, such as an adjuvant or a carrier. Adjuvants are typically substances that generally enhance the immune response of an animal to a specific antigen. Suitable adjuvants include, but are not limited to, Freund's adjuvant; other bacterial cell wall components; aluminum-based salts; calcium-based salts; silica; polynucleotides; toxoids; serum proteins; viral coat proteins; other bacterial-derived preparations; gamma interferon; block copolymer adjuvants, such as WO 97/40058 PCT/US97/06121 81 Hunter's Titermax adjuvant (Vaxcel m Inc. Norcross, GA); Ribi adjuvants (available from Ribi ImmunoChem Research, Inc., Hamilton, MT); and saponins and their derivatives, such as Quil A (available from Superfos Biosector A/S, Denmark). Carriers are typically compounds that increase the half-life of a therapeutic composition in the treated animal. Suitable carriers include, but are not limited to, polymeric controlled release formulations, biodegradable implants, liposomes, bacteria, viruses, oils, esters, and glycols.

One embodiment of the present invention is a controlled release formulation that is capable of slowly releasing a composition of the present invention into an animal. As used herein a controlled release formulation comprises a composition of the present invention in a controlled release vehicle. Suitable controlled release vehicles include, but are not limited to, biocompatible polymers, other polymeric matrices, capsules, microcapsules, microparticles, bolus preparations, osmotic pumps, diffusion devices, liposomes, lipospheres, and transdermal delivery systems. Other controlled release formulations of the present invention include liquids that, upon administration to an animal, form a solid or a gel in situ. Preferred controlled release formulations are biodegradable bioerodible).

WO 97/40058 PCT/US97/06121 82 A preferred controlled release formulation of the present invention is capable of releasing a composition of the present invention into the blood of the treated animal at a constant rate sufficient to attain therapeutic dose levels of the composition to reduce protease activity in fleas feeding from the animal over a period of time ranging from about 1 to about 12 months. A controlled release formulation of the present invention is capable of effecting a treatment for preferably at least about 1 month, more preferably at least about 3 months and even more preferably for at least about 6 months, even more preferably for at least about 9 months, and even more preferably for at least about 12 months.

In order to protect an animal from flea infestation, a therapeutic composition of the present invention is administered to the animal in an effective manner such that the protease activity of fleas feeding from the blood stream of animals treated with the composition is reduced.

As such, a treated animal is an animal that is competent to reduce the flea burden by reducing flea protease activity, or by reducing flea protease activity and at least one other flea activity. Preferably, the protease activity is reduced by at least about 50 percent, more preferably by at least about 70 percent and even more preferably by at least about 90 percent. Methods to administer compositions to the animal in order to render the animal competent depend WO 97/40058 PCT/US97/0612-1 83 on the nature of the composition and administration regime.

Animals administered a protease vaccine with at least one booster shot usually become competent at about the same time as would be expected for any vaccine treatment. For example, animals administered a booster dose about 4 to 6 weeks after a primary dose usually become competent within another about 3 to 4 weeks. Animals administered a composition including an anti-flea protease antibody or protease inhibitor become competent as soon as appropriate serum levels of the compound are achieved, usually with one to three days.

In a preferred embodiment, a composition of the present invention when administered to a host animal is able to reduce flea viability by at least about 50 percent within at least about 21 days after the fleas begin feeding from the treated animal. (Note that fleas usually live about 40 days to about 50 days on one or more animals.) A more preferred composition when administered to a host animal is able to reduce flea viability by at least about 65 percent within at least about 14 days after the fleas begin feeding from the treated animal. An even more preferred composition when administered to an animal is able to reduce flea viability by at least about 90 percent within at least about 7 days after the fleas begin feeding from the treated animal.

WO 97/40058 PCT/US97/06121 84 In another preferred embodiment, a composition of the present invention when administered to a host animal is able to reduce flea fecundity egg laying ability) by at least about 50 percent, more preferably by at least about 70 percent, and even more preferably by at least about 90 percent, within at least about 30 days after the fleas begin feeding from the treated animal. (Note that fleas usually do not begin laying eggs until about 7 days after taking a blood meal.) In accordance with the present invention, compositions are administered to an animal in a manner such that the animal becomes competent to reduce flea protease activity in a flea that feeds from the competent; the animal becomes a treated animal. For example, a flea protease vaccine of the present invention, when administered to an animal in an effective manner, is able to elicit stimulate) an immune response that produces an antibody titer in the blood stream of the animal sufficient to reduce flea protease activity. Similarly, an anti-flea protease antibody of the present invention, when administered to an animal in an effective manner, is administered in an amount so as to be present in the animal's blood stream at a titer that is sufficient to reduce flea protease activity. A protease inhibitor compound of the present invention, when administered to an animal in an effective manner, is administered in a manner WO 97/40058 PCTIUS97/06121 so as to be present in the animal's blood stream at a concentration that is sufficient to reduce flea protease activity. Oligonucleotide nucleic acid molecules of the present invention can also be administered in an effective manner, thereby reducing expression of flea proteases.

Compositions of the present invention can be administered to animals prior to or during flea infestation. It is to be noted that when vaccines of the present invention are administered to an animal, a time period is required for the animal to elicit an immune response before the animal is competent to inhibit protease activity of fleas feeding from that animal. Methods to obtain an immune response in an animal are known to those skilled in the art.

Acceptable protocols to administer compositions in an effective manner include individual dose size, number of doses, frequency of dose administration, and mode of administration. Determination of such protocols can be accomplished by those skilled in the art. A suitable single dose is a dose that is capable of protecting an animal from flea infestation when administered one or more times over a suitable time period. For example, a preferred single dose of a protease vaccine or a mimetope thereof ranges from about 1 microgram (pg, also denoted ug) to about 10 milligrams (mg) of the composition per kilogram body weight of the animal. Booster vaccinations can be WO 97/40058 PCT/US97/0612-1 86 administered from about 2 weeks to several years after the original administration. Booster vaccinations preferably are administered when the immune response of the animal becomes insufficient to protect the animal from flea infestation. A preferred administration schedule is one in which from about 10 pg to about 1 mg of the vaccine per kg body weight of the animal is administered from about one to about two times over a time period of from about 2 weeks to about 12 months. In one embodiment, a booster dose of a composition of the present invention is administered about 4 to 6 weeks after the primary dose, and additional boosters are administered about once or twice a year.

Modes of administration can include, but are not limited to, oral, nasal, topical, transdermal, rectal, and parenteral routes. Parenteral routes can include, but are not limited to subcutaneous, intradermal, intravenous, and intramuscular routes.

In another embodiment, a preferred single dose of an anti-flea protease antibody composition or a mimetope thereof ranges from about 1 pg to about 10 mg of the composition per kilogram body weight of the animal. Antiflea antibodies can be re-administered from about 1 hour to about biweekly for several weeks following the original administration. Booster treatments preferably are administered when the titer of antibodies of the animal becomes insufficient to protect the animal from flea WO 97/40058 PCT/US97/06121 87 infestation. A preferred administration schedule is one in which from about 10 pg to about 1 mg of an anti-fiea protease antibody composition per kg body weight of the animal is administered about every 2 to every 4 weeks.

Suitable modes of administration are as disclosed herein and are known to those skilled in the art.

According to one embodiment, a nucleic acid molecule of the present invention can be administered to an animal in a fashion to enable expression of that nucleic acid molecule into a protective protein flea protease vaccine, anti-flea protease antibody, or proteinaceous protease inhibitor) or protective RNA antisense RNA, ribozyme or RNA drug) in the animal to be protected from disease. Nucleic acid molecules can be delivered to an animal in a variety of methods including, but not limited to, direct injection as "naked" DNA or RNA molecules, such as is taught, for example in Wolff et al., 1990, Science 247, 1465-1468) or packaged as a recombinant virus particle vaccine or as a recombinant cell vaccine delivered to a cell by a vehicle selected from the group consisting of a recombinant virus particle vaccine and a recombinant cell vaccine).

A recombinant virus particle vaccine of the present invention includes a recombinant molecule of the present invention that is packaged in a viral coat and that can be expressed in an animal after administration. Preferably, WO 97/40058 PCT/US97/06121 88 the recombinant molecule is packaging-deficient. A number of recombinant virus particles can be used, including, but not limited to, those based on alphaviruses, poxviruses, adenoviruses, herpesviruses, and retroviruses. W h e n administered to an animal, a recombinant virus particle vaccine of the present invention infects cells within the immunized animal and directs the production of a protective protein or RNA nucleic acid molecule that is capable of protecting the animal from disease caused by a parasite of the present invention. A preferred single dose of a recombinant virus particle vaccine of the present invention is from about 1 x 104 to about 1 x 107 virus plaque forming units (pfu) per kilogram body weight of the animal.

Administration protocols are similar to those described herein for protein-based vaccines.

A recombinant cell vaccine of the present invention includes recombinant cells of the present invention that express at least one protein of the present invention.

Preferred recombinant cells include Salmonella, E. coli, Mycobacterium, S. frugiperda, baby hamster kidney, myoblast G8, COS, MDCK and CRFK recombinant cells, with Salmonella recombinant cells being more preferred. Such recombinant cells can be administered in a variety of ways but have the advantage that they can be administered orally, preferably at doses ranging from about 108 to about 1012 bacteria per kilogram body weight. Administration protocols are similar WO 97/40058 PCT/US97/06121 to those described herein for protein-based vaccines.

Recombinant cell vaccines can comprise whole cells or cell lysates.

Compositions of the present invention can be administered to any animal susceptible to flea infestation, including warm-blooded animals. Preferred animals to treat include mammals and birds, with cats, dogs, humans, cattle, chinchillas, ferrets, goats, mice, minks, rabbits, raccoons, rats, sheep, squirrels, swine, chickens, ostriches, quail and turkeys as well as other furry animals, pets and/or economic food animals, being more preferred. Particularly preferred animals to protect are cats and dogs.

The present invention includes compositions to treat flea infestation by any flea. As such, compositions of the present invention can be derived from any flea species.

Preferred fleas to target include fleas of the following genera: Ctenocephalides, Cyopsyilus, Diamanus (Oropsylla), Echidnophaga, Nosopsyllus, Pulex, Tunga, and Xenopsylla, with those of the species Ctenocephalides canis, Ctenocephalides fells, Diamanus montanus, Echidnophaga gallinacea, Nosopsyllus faciatus, Pulex irritans, Pulex simulans, Tunga penetrans and Xenopsylla cheopis being more preferred. Particularly preferred fleas from which to protect animals include fleas of the species Ctenocephalides felis, Ctenocephalides canis, and Pulex WO 97/40058 PCT/US97/06121 species Pulex irritans and Pulex simulans) It is also within the scope of the present invention to administer compositions of the present invention directly to fleas.

The present invention also includes the use of compositions of the present invention to reduce infestation by other ectoparasites as well as the use of compositions including protease vaccines, anti-protease antibodies and compounds that inhibit protease synthesis and/or activity derived from any ectoparasite to reduce ectoparasite infestation, particularly controlled release formulations containing such compositions. Preferred ectoparasites to target include arachnids, insects and leeches. More preferred ectoparasites to target include fleas; ticks, including both hard ticks of the family Ixodidae Ixodes and Amblyomma) and soft ticks of the family Argasidae Ornithodoros, such as 0. parkeri and 0.

turicata); flies, such as midges Culicoides), mosquitos, sand flies, black flies, horse flies, horn flies, deer flies, tsetse flies, stable flies, myiasiscausing flies and biting gnats; ants; spiders, lice; mites; and true bugs, such as bed bugs and kissing bugs, including those carrying Chagas disease. Even more preferred ectoparasites to target include fleas, mosquitos, midges, sandflies, blackflies, ticks and Rhodnius.

WO 97/40058 PCT/US97/06121 91 The following examples are provided for the purposes of illustration and are not intended to limit the scope of the present invention.

Examples It is to be noted that the Examples include a number of molecular biology, microbiology, immunology and biochemistry techniques considered to be known to those skilled in the art. Disclosure of such techniques can be found, for example, in Sambrook et al., ibid., Borovsky, Arch Insect Biochem. and Phys., 7:187-210, 1988, and related references. Examples 1 through 21, and the the sequence information provided in the sequence listing therein, of related PCT Publication No. WO 96/11706, published April 25, 1996, are incorporated herein by this reference in their entirety.

Example 1 This example describes the determination of internal amino acid sequence of a flea aminopeptidase.

About 10,200 cat blood-fed flea guts were dissected into 4 ml Gut Dissection Buffer (50 mM Tris-HC1, pH 8.0 and 100 mM CaC1,) Flea gut extracts were prepared by sonicating the flea guts and centrifuging them at about 14,000 rpm for about 20 minutes. The resulting pellet was washed and briefly sonicated in 2ml Gut Dissection Buffer and centrifuged again at about 14,000 rpm for about WO 97/40058 PCT/US97/06121 92 minutes. The resulting pellet was. resuspended and sonicated in 4 ml buffer comprising 20 mM NaAc, pH 0.1% Brij, complete protease inhibitor cocktail (available from Pierce) and 0.25 mM bestatin; the sonicate was centrifuged at about 14,000 rpm for about 20 minutes. Both the pellet and supernatant were recovered. The pellet was re-sonicated and centrifuged as above, and the resulting supernatant was combined with the original supernatant.

The pooled supernatant was applied to a polyCAT cation exchange HPLC column and protein was eluted with a NaCl gradient ranging from OM to 1M NaCl in 20 mM NaAc, pH Fractions collected from the column were assayed by H-Leu-AMC fluorescence, and active fractions were pooled and applied to a C-1 reverse phase HPLC column (TMS 250, Toso Hass). Proteins were eluted from the column using an acetonitrile gradient in 0.1% TFA in water, the gradient ranging between 20% and 100% acetonitrile. Proteins contained in fractions from the column were analyzed by SDS-PAGE gel electrophoresis and silver staining. The results of the gel electrophoresis indicated the presence of an about 95 kDa protein in some of the fractions. This protein correlates with the about 95 kDa protein described in Example 12 of related PCT Publication No. WO 96/11706 which was identified using membrane pellet from flea midgut lysates.

WO 97/40058 PCT/US97/06121.

93 To determine internal amino acid sequence of the kDa protein, those fractions containing the 95 kDa protein were pooled, dried and digested with BNPS-Skatole for about 72 hours at room temperature. The BNPS-Skatole digest was separated by 18% Tris-glycine PAGE gel electrophoresis and blotted onto PVDF membrane. A major band of about 28 kDa was cut out and N-terminally sequenced using techniques as described in Example 7 of related PCT Publication No. WO 96/11706. A partial N-terminal amino acid sequence of the internal peptide was obtained, namely LATTQFQATHARSAFPCFDEPAM (denoted herein SEQ ID NO:107).

Example 2 This example describes the cloning and sequencing of another flea aminopeptidase nucleic acid molecule.

Primer APN3 corresponding to a conserved region in Manduca sexta and rat aminopeptidases, having nucleic acid sequence 5' CCC AAA TTT TCC ATW GCN CCN GC 3' (N indicating any nucleotide; represented herein as SEQ ID NO:108) was used in combination with primer M13 Reverse primer (SEQ ID NO:87) to PCR amplify a portion of a flea aminopeptidase gene from a bovine blood-fed whole flea cDNA expression library as described in Example 8 of related PCT Publication No. WO 96/11706. The resulting product of the PCR amplification was diluted about 1:50 and used as a template in a second, semi-nested PCR amplification using WO 97/40058 PCT/US97/06121 94 a primer APN3 in combination with degenerate primer APN1C, designed using SEQ ID NO:107 (described in Example 1), having nucleic acid sequence 5' CAA TTY CAA GCT ACY CAT GC 3' (represented herein as SEQ ID NO:109). The resulting PCR product, named nfAP23,, was approximately 383-bp when visualized on a 1% agarose gel. The PCR product nfAP2 3 83 was gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques. The nucleotide sequence of nfAP2 3 3 is denoted SEQ ID NO:110.

Translation of SEQ ID NO:110 yielded a deduced flea aminopeptidase protein of about 127 amino acids, denoted herein as PfAP2 127 having amino acid sequence SEQ ID NO:111.

The PCR product nfAP2 3 3 was labelled with 32 P and used as a probe to screen a bovine blood-fed whole flea phage expression library using standard hybridization techniques.

A single plaque purified clone was isolated, which included a 2100-nucleotide insert, referred to herein as nfAP2 2100 Partial nucleic acid sequence was obtained using standard techniques from the 5' end of nfAP221 00 to yield a flea aminopeptidase nucleic acid molecule named nfAP2, 3 having nucleic acid sequence SEQ ID NO:112. Translation of SEQ ID NO:112 suggests that nucleic acid molecule nfAP2,s3 encodes a non-full-length flea aminopeptidase protein of about 178 amino acids, referred to herein as PfAP 1 having amino acid sequence SEQ ID NO:113, assuming the first codon spans from WO 97/40058 PCT/US97/06121 about nucleotide 2 through about nucleotide 4 of SEQ ID NO:112. SEQ ID NO:113 contains SEQ ID NO:107.

Flea aminopeptidase nucleic acid sequence SEQ ID NO:112 was compared with additional nucleic acid sequences characterized from other organisms. The nucleic acid sequence is about 50% identical to Manduca sexta aminopeptidase N nucleotides between corresponding regions of the two nucleic acid molecules.

Example 3 This example describes the cloning and sequencing of a flea cysteine protease nucleic acid molecule.

A flea cysteine protease nucleic acid molecule, referred to herein as nfCP1,3- was produced by PCR amplification using the following method. Primer Cal3F (designed to obtain a calreticulin gene), having nucleic acid sequence 5' TTG GGA TAC ACT TTG ACT GTT AAC C 3', represented herein as SEQ ID NO:97 was used in combination with the M13 universal primer, to PCR amplify, using standard techniques, a DNA fragment from a bovine blood-fed whole flea cDNA expression library as described above in Example 8 of related PCT Publication No. WO 96/11706.

Surprisingly, the isolated DNA fragment correlated with a cysteine protease nucleic acid sequence. Sequence from this DNA fragment was used to design primer CyslR, having the nucleic acid sequence 5' GTG AGC AAC CAT TAT TTC CAT WO 97/40058 PCT/US97/06121 96 ATC represented herein as SEQ ID NO:98, which was used in a second PCR amplification in combination with the M13 reverse primer. A third PCR amplification was performed using primer CyslF, having the nucleic acid sequence 5' CTT TCC TCA CAA TAC CAC CAA GGA AGC represented herein as SEQ ID NO:74, in combination with the M13 universal primer.

A fourth PCR amplification was performed using primer Cys2F, having the nucleic acid sequence 5' CTT GTA CGA TTG TCT CAA CAG GC represented herein as SEQ ID NO:76, in combination with the M13 universal primer. The resulting PCR products were each gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques. A composite nucleic acid sequence representing a flea cysteine protease coding region was deduced, referred to herein as nfCP 5 1 3 was deduced and is denoted herein as SEQ ID NO:76. Translation of SEQ ID NO:76 suggests that nucleic acid molecule nfCP'1, encodes a nonfull-length flea cysteine protease protein of about 191 amino acids, referred to herein as PfCP1 19 1 having amino acid sequence SEQ ID NO:77, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:76.

The nucleic acid and amino acid sequences of the nfCP1,-, nucleic acid molecule and PfCPl1,: protein, respectively, were compared to known nucleic acid and amino acid sequences using a Genbank homology search. SEQ ID WO 97/40058 PCT/US97/06121 97 NO:77 was found to be similar to the amino acid sequence of P. sativum cysteine protease. The most highly conserved region of continuous similarity between SEQ ID NO:77 and P.

sativum cysteine protease amino acid sequences spans from about amino acid 71 through about amino acid 165 of SEQ ID NO:77 and from about amino acid 17 through about amino acid 168 of the P. sativum cysteine protease, there being about 42% identity between the two regions. Comparison of the nucleic acid sequence encoding amino acids from about 205 through about 492 of nfCPls,, indicate that those regions are about 54% identical.

Example 4 This example describes the cloning and sequencing of certain flea serine protease nucleic acid molecules.

Additional serine protease cDNA nucleic acid molecules have been isolated in a manner similar to that described in Example 8 of related PCT Publication No. WO 96/11706. The actual primers used in PCR amplification of serine protease nucleic acid molecules from a bovine blood-fed flea cDNA expression library (produced as described in Example 8 of related PCT Publication No. WO 96/11706) included cat-try #2 (SEQ ID NO:86) in combination with either M13 reverse primer (SEQ ID NO:87) or H57 primer (SEQ ID NO:99). The resultant PCR products were gel purified and cloned into the TA Vector m Two recombinant TA vector clones were WO 97/40058 PCTIUS97/06121 98 isolated and found to correspond to previously cloned serine protease genes. These newly cloned nucleic acid molecules were subjected to nucleic acid sequencing using the Sanger dideoxy chain termination method, as described in Sambrook et al., ibid.

A. A nucleic acid sequence of the flea serine protease nucleic molecule corresponding to flea clone 5 (produced using primers cat try #2 and M13 reverse), namely nfSP5 806 is represented herein as SEQ ID NO:114. SEQ ID NO:116 and SEQ ID NO:117 are both contained within the sequence of the nucleic acid molecule nfSP5 06 Translation of SEQ ID NO:114 suggests that nucleic acid molecule nfSP5 06 encodes a close to full-length flea serine protease protein of about 245 amino acids, referred to herein as PfSP5 245 having amino acid sequence SEQ ID NO:115, assuming an open reading frame in which the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:114 and a stop codon spanning from about nucleotide 737 through about nucleotide 739 of SEQ ID NO:114. A Genbank homology search revealed most homology between SEQ ID NO:114 and a Gallus gallus trypsin gene, there being about 52% identity between corresponding regions of the two nucleic acid molecules.

B. A nucleic acid sequence of the flea serine protease nucleic molecule corresponding to flea clone 11 (produced using primers cat try #2 and M13 reverse), namely nfSPll 3 0 is represented herein as SEQ ID NO:118. SEQ ID NO:120 and WO 97/40058 PCT/US97/06121 99 SEQ ID NO:121 are within the sequence of the nucleic acid molecule nfSP11 30 Translation of SEQ ID NO:118 suggests that nucleic acid molecule nfSP11 30 encodes a non-fulllength flea serine protease protein of about 102 amino acids, referred to herein as PfSP11 102 having amino acid sequence SEQ ID NO:119, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:118.

C. A nucleic acid sequence of the flea serine protease nucleic molecule corresponding to flea clone 39 (produced using primers cat try #2 and H57), namely nfSP39 267 is represented herein as SEQ ID NO:122. Translation of SEQ ID NO:122 suggests that nucleic acid molecule nfSP39 267 encodes a non-full-length flea serine protease protein of about amino acids, referred to herein as PfSP39 9 having amino acid sequence SEQ ID NO:123, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:122.

Example This example describes the cloning and sequencing of certain flea serine protease nucleic acid molecules.

A. Bovine Blood-Fed Library Certain flea serine protease cDNA nucleic acid molecules have been isolated in a manner similar to that described in Example 8 of related PCT Publication No. WO WO 97/40058 PCT/US97/06121 100 96/11706, using two nucleic acid molecules as probes to screen a bovine blood-fed flea cDNA expression library (produced as described in Example 8 of related PCT Publication No. WO 96/11706), cat-try #1 (SEQ ID NO:124) and cat-try #2 (SEQ ID NO:86). Two clones that hybridized strongly to the probes were isolated and subjected to nucleic acid sequencing using the Sanger dideoxy chain termination method, as described in Sambrook et al., ibid.

1. The nucleic acid sequence of a flea serine protease nucleic molecule correlating to flea clone 8, namely nfSP8 436 is represented herein as SEQ ID NO:125. SEQ ID NO:127 is within the sequence of the nucleic acid molecule nfSP8 43 Translation of SEQ ID NO:125 yields a protein of about 145 amino acids, denoted PfSP8 14 having amino acid sequence SEQ ID NO:126, assuming the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:125. A Genbank homology search revealed most homology between SEQ ID NO:125 and an Anopheles gambiae trypsin precursor gene, there being about 48% identity between corresponding regions of the two nucleic acid molecules.

2. The nucleic acid sequence of a flea serine protease nucleic molecule corresponding to flea clone 12, namely nfSP12 75 is represented herein as SEQ ID NO:128. SEQ ID NO:130 and SEQ ID NO:131 are both contained within the sequence of the nucleic acid molecule nfSP127s 8 Translation WO 97/40058 PCT/US97/06121 101 of SEQ ID NO:128 yields a protein of about 246 amino acids, denoted PfSP12,,,, having amino acid sequence SEQ ID NO:129, assuming an open reading frame in which the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:128 and a stop codon spanning from about nucleotide 739 through about nucleotide 741 of SEQ ID NO:128. A Genbank homology search revealed most homology between SEQ ID NO:128 and a rat trypsinogen gene, there being about 57% identity between corresponding regions of the two nucleic acid molecules.

B. Cat Blood-Fed Library Certain flea serine protease cDNA genes have been isolated from a cat blood-fed flea cDNA expression library by screening the library with the cat-try #1 (SEQ ID NO:124) and cat-try #2 (SEQ ID NO:86) probes. The cat blood-fed flea library was produced in a similar manner as the bovine blood-fed flea library (described in Example 8 of related PCT Publication No. WO 96/11706) except the fleas were fed on cat blood. Two clones that hybridized strongly to the probes were isolated and subjected to nucleic acid sequencing using methods described above.

1. The nucleic acid sequence of one of the flea serine protease nucleic molecules, namely nfSP2 6 10 is represented herein as SEQ ID NO:132. Translation of SEQ ID NO:132 yields a non-full-length sequence of about 185 amino acids, denoted PfSP26, 18 having amino acid sequence SEQ ID WO 97/40058 PCT/US97/06121 102 NO:133, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:132.

A Genbank homology search revealed most homology between SEQ ID NO:133 and a Aedes aegypti trypsin protein sequence, there being about 48% identity between corresponding regions of the two amino acid sequences.

2. The nucleic acid sequence of a flea serine protease nucleic molecule, namely nfSP27 38 is represented herein as SEQ ID NO:134. Translation of SEQ ID NO:134 yields a protein of about 128 amino acids, denoted PfSP27 28 having amino acid sequence SEQ ID NO:135, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:134.

Example 6 This example describes the cloning and sequencing of certain flea serine protease nucleic acid molecules.

Certain serine protease cDNA nucleic acid molecules have been isolated from reverse transcriptase PCR amplification of mRNA isolated from cat blood-fed whole fleas. The mRNA was isolated from fleas gathered over 72 hours after the initiation of feeding on cat blood. As such, the mRNA comprised a mixture of mRNA isolated at different time points over 72 hours. The mRNA was isolated using ground-up fleas, extracting total flea RNA using Tri-Reagent (available from Molecular Research Center, Cincinnati, Ohio)and an Invitrogen Fast Track" RNA WO 97/40058 PCTIUS97/06121.

103 isolation kit (available from Invitrogen, Inc. San Diego, CA). cDNA was synthesized using a Stratagene RT-PCR kit (available from Stratagene, Inc, San Diego, CA). Primers used for first-strand cDNA synthesis included an equal molar mixture of the following: 5'dT-2VT3' and 5'dT-2VC3' (as provided in a differential display kit, available from Operon Technologies, Inc. Alameda, CA).

The actual primers used in the PCR amplification of the cDNA described above included cat-try #2 (SEQ ID NO:86) used in combination with H57 primer (SEQ ID NO:99). The resultant PCR products were gel purified and cloned into the TA Vector'. Recombinant TA vector clones were isolated and the nucleic acid molecules were subjected to nucleic acid sequencing using analysis as described above.

A. A nucleic acid sequence of one of the flea serine protease nucleic molecules, namely nfSP23 423 is represented herein as SEQ ID NO:136. Translation of SEQ ID NO:136 suggests that nucleic acid molecule nfSP23 423 encodes a nonfull-length flea serine protease protein of about 141 amino acids, referred to herein as PfSP23 14 1, having amino acid sequence SEQ ID NO:137, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:136. A Genbank homology search revealed most homology between SEQ ID NO:136 and a Homo sapiens plasminogen precursor gene, there being about 51% identity between corresponding regions of the two nucleic acid molecules.

WO 97/40058 PCT/US97/06121 104 B. Another nucleic acid sequence of a flea serine protease nucleic molecule, namely nfSP24 4 0 o, is represented herein as SEQ ID NO:78. Translation of SEQ ID NO:78.

suggests that nucleic acid molecule nfSP24, 1 encodes a nonfull-length flea serine protease protein of about 136 amino acids, referred to herein as PfSP24 136 having amino acid sequence SEQ ID NO:79, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:78. A Genbank homology search revealed most homology between SEQ ID NO:79 and an Anopheles gambiae chymotrypsin protein sequence, there being about 38% identity between corresponding regions of the two amino acid sequences.

C. Another nucleic acid sequence of a flea serine protease nucleic molecule, namely nfSP33 426 is represented herein as SEQ ID NO:82. Translation of SEQ ID NO:82 suggests that nucleic acid molecule nfSP33 426 encodes a nonfull-length flea serine protease protein of about 142 amino acids, referred to herein as PfSP33 1 42 having amino acid sequence SEQ ID NO:83, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:82. A Genbank homology search revealed most homology between SEQ ID NO:83 and a Drosophila serine protease stubble protein sequence, there being about 45% identity between corresponding regions of the two amino acid sequences.

WO 97/40058 PCT/US97/06121.

105 D. Another nucleic acid sequence of one of the flea serine protease nucleic molecule, namely nfSP36,,, is represented herein as SEQ ID NO:138. SEQ ID NO:138 represents a partial sequence of a PCR amplified nucleic acid molecule nfSP36o 00 Translation of SEQ ID NO:138 suggests that nucleic acid molecule nfSP36 19 encodes a nonfull-length flea serine protease protein of about 65 amino acids, referred to herein as PfSP36 65 having amino acid sequence SEQ ID NO:139, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:138. A Genbank homology search revealed most homology between SEQ ID NO:139 and a Drosophila melanogaster easter protein sequence, there being about 42% identity between corresponding regions of the two amino acid sequences.

E. Another nucleic acid sequence of a flea serine protease nucleic molecule, namely nfSP38 34 1, is represented herein as SEQ ID NO:140. Translation of SEQ ID NO:140 suggests that nucleic acid molecule nfSP38 34 encodes a nonfull-length flea serine protease protein of about 113 amino acids, referred to herein as PfSP38n3, having amino acid sequence SEQ ID NO:141, assuming the first codon spans from about nucleotide 3 through about nucleotide 5 of SEQ ID NO:140. A Genbank homology search revealed most homology between SEQ ID NO:141 and a rat trypsinogen protein sequence, there being about 30% identity between corresponding regions of the two amino acid sequences.

WO 97/40058 PCT/US97/06121 106 F. A nucleic acid sequence of one of the flea serine protease nucleic molecules, namely nfSP34 390 is represented herein as SEQ ID NO:142. Translation of SEQ ID NO:142 suggests that nucleic acid molecule nfSP4 390 encodes a nonfull-length flea serine protease protein of about 130 amino acids, referred to herein as PfSP34 130 having amino acid sequence SEQ ID NO:143, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:142. A Genbank homology search revealed most homology between SEQ ID NO:143 and a Drosophila melanogaster Delta precursor protein sequence, there being about 33% identity between corresponding regions of the two amino acid sequences.

Example 7 This example describes the cloning and sequencing of a flea serine protease nucleic acid molecule.

A serine protease cDNA nucleic acid molecule was isolated in a manner similar to that described in Example 8 of related PCT Publication No. WO 96/11706. The actual primers used in PCR amplification of the serine protease nucleic acid molecule from a cat blood-fed whole flea cDNA expression library (produced as described in Example included cat-try #2 (SEQ ID NO:86) in combination with M13 reverse primer (SEQ ID NO:87). The resulting PCR product was diluted 1:25 and used as a template in a second PCR WO 97/40058 PCT/US97/06121 107 reaction using the forward vector primer T3 in combination with the reverse primer (derived from the nucleic acid sequence of nfSP33 77 described in Example 6) having the nucleic acid sequence 5' ATT CCT CGT GGT TCA GTC GCT C 3', represented herein as SEQ ID NO:100. The resultant PCR product was gel purified and cloned into the TA Vector m The clones were subjected to nucleic acid sequencing as described above.

A nucleic acid sequence of a flea serine protease nucleic molecule, namely nfSP33,, is represented herein as SEQ ID NO:84. As expected, SEQ ID NO:84 includes a portion of SEQ ID NO:82. Translation of SEQ ID NO:84 suggests that nucleic acid molecule nfSP33 77 8 encodes a non-full-length flea serine protease protein of about 259 amino acids, referred to herein as PfSP33 2 s 9 having amino acid sequence SEQ ID NO:85, assuming the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:84.

A Genbank homology search revealed most homology between SEQ ID NO:84 and a Drosophila serine protease stubble gene, there being about 54% identity between nucleotides 23 778 of SEQ ID NO:84 and nucleotides 2324 3064 of the Drosophila serine protease stubble gene.

Example 8 This example describes the cloning and sequencing of another flea serine protease nucleic acid molecule.

WO 97/40058 PCT/US97/06121 108 Using the method described in Example 5, a cDNA clone of a flea serine protease was obtained using mRNA isolated from bovine blood-fed whole fleas. The resulting cDNA was used as a template in PCR amplification using the primers cat-try #2 (SEQ ID NO:86) used in combination with H57 primer (SEQ ID NO:99). The resultant PCR products were gel purified and cloned into the TA Vector. One recombinant TA vector clone was isolated and the flea serine protease nucleic acid molecule and denoted nFS37s 00 was subjected to nucleic acid sequencing as described in Sambrook et al., ibid.

The nucleic acid sequence of part of the flea serine protease nucleic molecule nFS37 500 namely nfSP37 261 is represented herein as SEQ ID NO:144. Translation of SEQ ID NO:144 suggests that nucleic acid molecule nfSP37 26 1 encodes a non-full-length sequence of a flea serine protease protein of about 87 amino acids, referred to herein as PfSP37-, having amino acid sequence SEQ ID NO:145, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:144. A Genbank homology search revealed most homology between SEQ ID NO:145 and a chicken trypsinogen protein sequence, there being about 31% identity between corresponding regions of the two amino acid sequences.

WO 97/40058 PCT/US97/0612-1 109 Example 9 This example describes the cloning and sequencing of certain larval flea serine protease nucleic acid molecules.

Certain serine protease cDNA nucleic acid molecules have been isolated from a mixed instar larval cDNA library produced using 1st, 2nd and 3rd instar larvae fed on cat blood, by PCR amplification. The actual primers used in the PCR amplification included either cat-try #2 (SEQ ID NO:86) in combination with either H57 primer (SEQ ID NO:99)or M13 reverse primer (SEQ ID NO:87). The resultant PCR products were gel purified and cloned into the TA Vector'". Three recombinant TA vector clones were isolated containing PCR products using cat-try #2 and M13 reverse as primers and one clone was isolated containing PCR products using cat-try #2 and H57 primers. These newly cloned nucleic acid molecules were subjected to nucleic acid sequencing as described above.

A. A nucleic acid sequence of one of the larval flea serine protease nucleic molecules isolated using cat-try #2 and M13 reverse primers, namely nfSP29 612 is represented herein as SEQ ID NO:146. Translation of SEQ ID NO:146 suggests that nucleic acid molecule nfSP29 612 encodes a close to full-length flea serine protease protein of about 204 amino acids, referred to herein as PfSP2920 4 having amino acid sequence SEQ ID NO:147, assuming an open reading frame in which the first codon spans from about nucleotide WO 97/40058 PCT/US97/06121 110 through about nucleotide 12 of SEQ IDNO:146. A Genbank homology search revealed most homology between SEQ ID NO:146 and a rat trypsinogen gene, there being about identity between corresponding regions of the two nucleic acid molecules.

B. Another nucleic acid sequence of one of the larval flea serine protease nucleic molecules isolated using cat-try #2 and M13 reverse primers, namely nfSP30, 4 is represented herein as SEQ ID NO:148. Translation of SEQ ID NO:148 suggests that nucleic acid molecule nfSP30 64 encodes a non-full-length flea serine protease protein of about 213 amino acids, referred to herein as PfSP30 2 3 having amino acid sequence SEQ ID NO:149, assuming the first codon spans from about nucleotide 3 through about nucleotide 5 of SEQ ID NO:148. A Genbank homology search revealed most homology between SEQ ID NO:148 and a Anopheles gambiae trypsin gene, there being about 52% identity between corresponding regions of the two nucleic acid molecules.

C. Another nucleic acid sequence of one of the larval flea serine protease nucleic molecules isolated using cat-try #2 and M13 reverse primers, namely nfSP31 62 6 is represented herein as SEQ ID NO:150. Translation of SEQ ID NO:150 suggests that nucleic acid molecule nfSP3126 encodes a non-full-length flea serine protease protein of about 208 amino acids, referred to herein as PfSP31 208 WO 97/40058 PCT/US97/06121.

111 having amino acid sequence SEQ ID NO:151, a assuming the first residue spans from about nucleotide 3 through about nucleotide 5 or from a putative start codon spanning from about nucleotide 6 to about nucleotide 8 of SEQ ID NO:150.

A Genbank homology search revealed homology between SEQ ID NO:150 and an Anopheles gambiae trypsin gene, there being about 52% identity between corresponding regions of the two nucleic acid molecules.

D. A nucleic acid sequence of a larval flea serine protease nucleic molecule isolated using cat-try #2 and H57 primers, namely nfSP324 33 is represented herein as SEQ ID Translation of SEQ ID NO:80 suggests that nucleic acid molecule nfSP32 433 encodes a non-full-length flea serine protease protein of about 144 amino acids, referred to herein as PfSP321 44 having amino acid sequence SEQ ID NO:81, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:80. A Genbank homology search revealed most homology between SEQ ID and an Anopheles gambiae trypsin gene, there being about 52% identity between corresponding regions of the two nucleic acid molecules.

Example This example describes the cloning and sequencing of another flea serine protease nucleic acid molecule.

WO 97/40058 PCT/US97/06121 112 A bovine blood-fed whole flea cDNA library (prepared as described in Example 8 of related PCT Publication No. WO 96/11706) was immunoscreened with antiserum collected from a rabbit that was immunized with a collection of flea salivary gland products referred to as fspN (as described in PCT Publication No. WO 96.11271, entitled "NOVEL ECTOPARASITE SALIVA PROTEINS AND APPARATUS TO COLLECT SUCH PROTEINS", published April 18, 1996). Immunoscreening was performed as follows. New Zealand White rabbit antiserum developed against fspN flea saliva products was used in the immunoscreening protocols described in the picoBlue m Immunoscreening Kit instruction manual, available from Stratagene, Inc. The methods for preparation of the cDNA expression libraries for immunoscreening, expression of the cDNA clones and procedures for transferring lambda phage plaques to membranes for immunoscreening, are described in the ZAP-cDNA Synthesis Kit instruction manual, also available from Stratagene, Inc., La Jolla, California.

A nucleotide sequence for a flea serine protease nucleic acid molecule named nfSP15 8 5 is denoted as SEQ ID NO:152 and corresponds to SEQ ID NO:154. Translation of SEQ ID NO:152 suggests that nucleic acid molecule nfSP15 81 encodes a close to full-length flea serine protease protein of about 254 amino acids, referred to herein as PfSP15 254 having amino acid sequence SEQ ID NO:153, assuming an open reading frame in which the first codon spans from about WO 97/40058 PCT/US97/06121 113 nucleotide 1 through about nucleotide 3 of SEQ ID NO:152 and a stop codon spanning from about nucleotide 763 through about nucleotide 765 of SEQ ID NO:152. A Genbank homology search revealed homology between SEQ ID NO:152 and an Anopheles gambiae trypsin gene, there being about 52% identity between corresponding regions of the two nucleic acid molecules.

Example 11 This example describes the cloning and sequencing of additional flea serine protease nucleic acid molecules.

Certain flea serine protease cDNA nucleic acid molecules have been isolated in a manner similar to that described in Example 8 of related PCT Publication No. WO 96/11706, using two nucleic acid molecules as probes to screen an unfed flea cDNA expression library, nfSP8 299

(SEQ

ID NO:127) and nfSP19 359 (SEQ ID NO:155). A clone that hybridized strongly to the probes was isolated and subjected to nucleic acid sequencing as described above.

A. The nucleic acid sequence of the flea serine protease nucleic molecule, namely nfSP19,s 5 is represented herein as SEQ ID NO:156. SEQ ID NO:155 is within the sequence of the nucleic acid molecule nfSP19, 8 5 Translation of SEQ ID NO:156 yields an apparent full-length protein of about 253 amino acids, denoted PfSP19 2 3 having amino acid sequence SEQ ID NO:157, assuming the first codon, an WO 97/40058 PCT/US97/06121 114 apparent start codon, spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:156. A Genbank homology search revealed most homology between SEQ ID NO:156 and an Aedes aegypti trypsin, there being about 53% identity between corresponding regions of both nucleic acid molecules.

B. The nucleic acid sequence of another flea serine protease nucleic molecule, namely nfSP25 864 is represented herein as SEQ ID NO:158. Translation of SEQ ID NO:158 yields a protein of about 260 amino acids, denoted PfSP25 260 having amino acid sequence SEQ ID NO:159, assuming the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:158 and a stop codon spanning from about nucleotide 782 through about nucleotide 784 of SEQ ID NO:159. A Genbank homology search revealed most homology between SEQ ID NO:159 and an Anopheles gambiae chymotrypsin protein sequence, there being about 34% identity between corresponding regions of the two amino acid sequences.

Example 12 This example describes the cloning and sequencing of another flea serine protease nucleic acid molecule.

A flea serine protease cDNA nucleic acid molecule has been isolated in a manner similar to that described in Example 8 of related PCT Publication No. WO 96/11706, using WO 97/40058 PCT/US97/06121 115 nfSP11252 (SEQ ID NO:121) as a probe to screen an bovine blood-fed flea cDNA expression library (produced as described in Example 8 of related PCT Publication No. WO 96/11706). A clone that hybridized strongly to the probe was isolated and subjected to nucleic acid sequencing using the Sanger dideoxy chain termination method, as described in Sambrook et al., ibid.

The nucleic acid sequence of the flea serine protease nucleic molecule, namely nfSP21 9 5 is represented herein as SEQ ID NO:160. Translation of SEQ ID NO:160 yields a protein of about 198 amino acids, denoted PfSP21 19 having amino acid sequence SEQ ID NO:161, assuming the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:160 pnd a putative stop codon spanning from about nucleotide 596 to about nucleotide 598. A Genbank homology search revealed most homology between SEQ ID NO:161 and Tachypleus tridentatus coagulation factor G protein sequence, there being about 45% identity between corresponding regions of the two amino acid sequences.

Example 13 This example describes the isolation and characterization of a 31 kD flea serine protease.

Guts from about 1500 fleas that had been fed on cat blood for about 24 hours were dissected in Gut Dissection Buffer (50 mM Tris 8.0, 100 mM CaCl,). The guts were WO 97/40058 PCTS97/06121.

116 disrupted by freezing and thawing 4 times, followed by sonication. The resulting extracts were clarified by centrifugation for 20 minutes at 14,000 rpm in a microfuge at 4 The supernatant was recovered.

The gut supernatant was loaded onto a 3 ml column comprising p-aminobenzamidine cross-linked to Sepharose beads (Sigma), previously equilibrated in Benzamidine Column Buffer (50 mM Tris 8.0, 100 mM CaCl 2 400 mM NaC) The supernatant was incubated on the column for about min. Unbound protein was slowly washed off the column using Benzamidine Column Buffer until no protein was detectable by Bradford Assay (Bio Rad).

Proteases bound to the benzamidine column were then eluted using 10 ml Benzamidine Column Buffer supplemented with 10 mM p-aminobenzamidine (brought to pH 8.0 with NaOH). Proteases in the eluant were concentrated and diafiltered into a volume of about 0.3 ml Gut Dissection Buffer using a Microcon 3 concentrator (Amicon).

About 120 1l of the concentrated eluant was further concentrated to a volume of about 30 gl. Proteases contained in this concentrate were resolved by gel electrophoresis on a 14% Tris-Glycine electrophoresis gel gl per lane approximately 300 gut equivalents per lane). After electrophoresis, the separated proteases were blotted onto a PVDF membrane using a CAPS buffer (10 mM CAPS pH 11, 0.5 mM DTT) The membrane was stained with WO 97/40058 PCT/US97/06121 117 Coomassie Brilliant Blue. A dominant protein band of about 31 kDa was visualized. The membrane was then used for automated N-terminal sequencing (described in Example 7 of related- PCT Publication No. WO 96/11706). A partial N-terminal amino acid sequence of the flea protease was determined to be IVGGEDVDISTCGWC (denoted SEQ ID NO:68).

Example 14 This example describes the isolation and characterization of a 31 kD flea serine protease contained in a formulation having IgGase activity ability to proteolyze immunoglobulin G proteins).

Cat blood-fed -flea gut extracts were prepared and selected on a benzamidine column as described above in Example 13. IgG protease activity was assayed by incubating at 37 0 C, overnight, the benzamidine eluant with cat immunoglobulin G proteins (IgG) purified on Protein A sepharose. The ability of the flea gut benzamidine eluant to digest cat IgG was detected by resolving the samples by gel electrophoresis through a 14% SDS-PAGE gel and silver staining the gel using standard methods. The marked decrease (compared with control samples lacking protease activity) of a 50 kDa band on the silver stained gel, representing cat IgG heavy chain, indicated that the benzamidine eluant contains IgG protease activity.

WO 97/40058 PCT/US97/06121.

118 The benzamidine eluant was then purified on a PolyPropylaspartamide hydrophobic interaction chromatography (HIC) column by applying the eluant to the column in buffer containing 0.1 M KPO 4 pH 6.5 and 2 M

(NH

4 2 SO4. Proteases bound to the column were eluted using an ammonium sulfate gradient of 2 M to 0 M in HIC column buffer. Column fractions were tested for IgG protease activity using the method described above. Fractions containing IgG protease activity were pooled and applied to a PolyCat cation exchange column in 20 M sodium acetate, pH 6. The proteins were eluted using a sodium chloride gradient of 0 M to 1 M NaC1 in 20 M sodium acetate.

Fractions eluted from the column were tested for IgG protease activity and then each fraction was resolved by electrophoresis using SDS-PAGE. Fractions having the highest levels of IgG protease activity included a protein band that migrated at about 31 kDa on the SDS-PAGE gel.

Weaker protease activity corresponded to an about 28 kDa band.

The 31 kDa protein present on the SDS-PAGE gel was used for N-terminal amino acid sequencing using the blotting method described above. A partial N-terminal amino acid sequence was determined to be IVGGEDVDIST(C)GWQI(S)FQ(S)ENLHF(C)GG(S)IIAPK (denoted herein as SEQ ID NO:69). A comparison of SEQ ID NO:69 and SEQ ID NO:68 (described in Example 13) indicates a single WO 97/40058 PCT/US97/06121 119 residue difference between the two amino acid sequences at residue 15 of each sequence Q and C, respectively).

Since SEQ ID NO:69 correlates with IgGase activity, the data suggests that the larval protein containing SEQ ID NO:68 has IgGase activity.

Example This example describes the cloning and sequencing of a 31 kDa flea serine protease contained in a formulation having IgGase activity.

A flea protease nucleic acid molecule was isolated from a cat blood-fed whole flea library (described in Example 6) and a bovine blood-fed whole flea library (described in Example 8 of related PCT Publication No. WO 96/11706) by PCR amplification. The actual primers used in the PCR amplification included FP31A primer designed using the N-terminal amino acid sequence SEQ ID NO:68, the primer having the nucleic acid sequence 5' GAA GAT GTW GAT ATT TCW ACA TGT GG 3' (SEQ ID NO:101) used in combination with the M13 universal primer. The resultant PCR products were gel purified and cloned into the TA Vector m and subjected to nucleic acid sequencing as described above.

A FP31B primer GAA AAT GAA ATC CAC TTA AAC ATT ACG (represented herein as SEQ ID NO:102) was designed using the DNA sequence of a DNA fragment from a bovine blood-fed cDNA library. A flea protease cDNA nucleic acid WO 97/40058 PCT/US97/06121 120 molecule was isolated by PCR amplification of the cat blood-fed whole flea library and the bovine blood-fed whole flea library described above by PCR amplification. PCR amplification was performed using the FP31B primer in combination with M13 reverse primer. The resulting PCR products were then diluted 1:25, and used as a template for a second PCR reaction using primer FP31C, having the sequence 5' CTC TTA TTG TAC GAG GGA TGC 3' (denoted herein SEQ ID NO:103) in combination with T3 primer. The resulting nested PCR product was cloned into TA Vector"T and subjected to DNA sequencing.

The nucleic acid sequence of the resulting flea serine protease nucleic molecule, namely nfSP28 923 is represented herein as SEQ ID NO:66. Translation of SEQ ID NO:66 yields a protein of about 267 amino acids, denoted PfSP28 2 6 having amino acid sequence SEQ ID NO:67, assuming an open reading frame in which the putative start codon spans from about nucleotide 8 through about nucleotide 10 of SEQ ID NO:66 or from about nucleotide 11 through about nucleotide 13, and a stop codon spanning from about nucleotide 803 through about nucleotide 805 of SEQ ID NO:66. SEQ ID NO:67 contains SEQ ID NO:68 except Q is substituted for C, and SEQ ID NO:69.. A Genbank homology search revealed most homology between SEQ ID NO:66 and Bombix mori vitellindegrading protease gene, there being about 53% identity WO 97/40058 PCT/US97/06121 121 between corresponding regions of the two nucleic acid sequences.

Example 16 This example describes 3 H-DFP labelling of larval serine proteases.

About 100 unfed larvae, 100 1st instar larvae, and 100 3rd Instar larvae were collected in 100 41 Gut Dissection Buffer (50 mM Tris 8.0, 100 mM CaCl 2 About 400 )l of water was added to the collected larvae, which were then sonicated. The sonicates were clarified by centrifugation at 15,000 RPM in an SS-34 rotor for 30 minutes at 4°C. The supernatant from each larval sonicate was recovered and concentrated to a volume of about 120 pl (1.2 l per larval equivalent).

Samples containing about 25 larval equivalents (about 30g1) were labeled with about 2.5 ACi of 3H-diisopropylfluorophosphate (DFP; available from New England Nuclear) and incubated at 4°C for 18 hours.

Following the incubation period, 5 larval equivalents of each larval stage (about 6 4l) were run on a 14% Tris Glycine SDS-PAGE gel. The gel was then soaked in Entensify (available from New England Nuclear) to enhance the tritium signal for autoradiography, and dried. The dried gel was then exposed to X-ray film (Kodak XO-mat) for about 3 days at -700C.

WO 97/40058 PCT/US97/06121 122 The results indicate that gut extracts from unfed larvae do not contain detectable amounts of serine proteases (Fig.l, lane while both fed 1st instar larvae (Fig.l, lane B) and fed 3rd instar larvae (Fig.l, lane C) produce serine proteases. In particular, fed 1st instar larvae primarily produce a serine protease having a molecular weight of about 25 kD; and fed 3rd instar larvae produce about serine proteases having molecular weights of about 25 kD, 28 kD and 31 kD. The approximate size of standard molecular weight protein markers are shown in Fig.

1.

Example 17 This example describes the determination of partial Nterminal amino acid sequences for several larval serine proteases.

About 10,300 3rd Instar larvae were collected in Gut Dissection Buffer (50 mM Tris, pH 8.0, 100 mM CaCl 2 The larvae were homogenized by sonication and clarified by centrifugation at 15,000 rpm in an SS-34 rotor for 30 min., at 4°C. The supernatant was recovered. The 3rd Instar supernatant was mixed with 5 ml of p-aminobenzamidine cross-linked to Sepharose beads (available from Sigma) equilibrated in Benzamidine Column Buffer (50 mM Tris pH 100 mM CaCl 2 400 mM NaCl) The supernatant was rocked with the beads overnight at 4°C. The beads were washed in WO 97/40058 PCT/US97/06121.

123 about 45 ml Benzamidine Column Buffer to remove unbound protein. The beads were then mixed 2 hours at 4 0 C with about 10 ml of Benzamidine Column Buffer containing 100 mM p-aminobenzamidine (pH 8.0 adjusted with NaOH) to elute proteins bound to the beads. The eluted proteins were then collected. The elution process was repeated once more.

The eluted protein was concentrated by ultrafiltration with a Centriprep 10 concentrator (available from Amicon). The concentrate was diluted with Gut Dissection Buffer to a final volume of about 5 ml.

Partial N-terminal amino acid sequence of proteins eluted from the beads was obtained using the method described in Example 11. Two proteins having molecular weights of about 25 kDa and about 26 kDa were identified on the Coomassie Brilliant Blue stained membranes. Partial Nterminal amino acid sequence obtained for the protein having a molecular weight of about 25 kDa is IVGGVSVNINDYGYQLSLQSNGR, denoted herein as SEQ ID NO:162.

Partial N-terminal amino acid sequence obtained for the protein having a molecular weight of about 26 kDa is IVGGHDTSIKQHPYQV, denoted herein as SEQ ID NO:163.

Example 18 This Example demonstrates the production of certain flea serine protease proteins of the present invention.

WO 97/40058 PCT/US97/06121.

124 A. Flea serine protease protein PfSPl' 2 was in the following manner. Flea serine protease nucleic acid molecule nfSPl 70 o, produced as described in Example 20 of related PCT Publication No. WO 96/11706, was digested with XhoI restriction endonuclease, gel purified and subcloned into expression vector APR/T 2 ori/S1OHIS-RSET-A9 (the production of which is described in Tripp et al, International PCT Publication No. WO 95/24198, published September 14, 1995; see, in particular, Example that had been digested with XhoI and dephosphorylated. The resultant recombinant molecule, referred to herein as pHisCro-nfSP1 670 was transformed into E. coli HB101 competent cells (available from Gibco BRL) to form recombinant cell E. coli :pHisCro-nfSP1 670 The recombinant cell was cultured as described in Example 20 of related PCT Publication No. WO 96/11706. Flea serine protease protein PfSP1 2 ,1 was purified by nickel chelation chromatography followed by reverse phase high performance liquid chromatography (HPLC). Immunoblot analysis of the purified PfSPl 2 1 indicated that rabbit anti-flea protease antiserum, produced as described in example 14, selectively bound to PfSP1, 6 B. Flea serine protease protein PfSP2 233 was produced in the following manner. Flea serine protease nucleic acid molecule nfSP2, 1 produced as described in Example 20 of related PCT Publication No. WO 96/11706, was digested with WO 97/40058 PCT/US97/06121.

125 XhoI restriction endonuclease, gel purified and subcloned into expression vector XPR/T ori/SlOHIS-RSET-A9 as described in Example 39A. The resultant recombinant molecule, referred to herein as pHisCro-nfSP2,,, was transformed into E. coli HB101 competent cells (available from Gibco BRL) to form recombinant cell E. coli :pHisCronfSP2I 1 5 The recombinant cell was cultured as described in Example 20 of related PCT Publication No. WO 96/11706.

Flea serine protease protein PfSP2 233 was purified by nickel chelation chromatography followed by reverse phase HPLC.

Immunoblot analysis of the purified PfSP2 233 indicated that rabbit anti-flea protease antiserum, produced as described in example 14 of related PCT Publication No. WO 96/11706, selectively bound to PfSP2 233 C. Flea serine protease protein PfSP13 225 was produced in the following manner. Flea serine protease nucleic acid molecule nfSP13 700 produced as described in Example 20 of related PCT Publication No. WO 96/11706, was digested with XhoI restriction endonuclease, gel purified and subcloned into expression vector XPR/T 2 ori/SlOHIS-RSET-A9 as described in Example 18A. The resultant recombinant molecule, referred to herein as pHisCro-nfSP137 00 was transformed into E. coli HB101 competent cells (available from Gibco BRL) to form recombinant cell E. coli :pHisCro-nfSP13 70 o. The recombinant cell was cultured as described in Example 20 of related PCT Publication No. WO 96/11706. Flea serine WO 97/40058 PCT/US97/06121.

126 protease protein PfSP13,,, was purified by nickel chelation chromatography followed by reverse phase HPLC. Immunoblot analysis of the purified PfSP13 225 indicated that rabbit anti-flea protease antiserum, produced as described in Example 14 of related PCT Publication No. WO 96/11706, selectively bound to PfSPl3 22 D. Flea serine protease protein PfSP20 222 was produced in the following manner. An about 669-bp DNA fragment, referred to herein as nfSP206 6 9 and designed to encode an apparently mature serine protease protein, was PCR amplified from flea serine protease clone 20 using the Xhol-site containing primer F27-S (sense) 5' GAG CTC TCG AGA ATC GTA GGA GGA CAC GAT AC 3' (SEQ ID NO:164) and the EcoRI-site containing primer F20-A (antisense) 5' G GAC GAA TTC TTA AAC ACC AGA CAC TTC CTT G 3' (SEQ ID NO:165). The PCR product nfSP20 6 69 was digested with XhoI and EcoRI restriction endonucleases, gel purified and subcloned into expression vector XP,/T2ori/S1OHIS-RSET-A9 as described in Example 18A. The resultant recombinant molecule, referred to herein as pHisCro-nfSP20g 6 9 was transformed into E. coli HB101 competent cells (available from Gibco BRL) to form recombinant cell E. coli :pHisCro-nfSP20 6 6 The recombinant cell was cultured as described in Example 20 of related PCT Publication No. WO 96/11706. Immunoblot analysis of recombinant cell E. coli:pHisCro-nfSP20 66 9 lysates using a T7 tag monoclonal antibody (available from Novagen, Inc.) WO 97/40058 PCTUS97/06121.

127 directed against the fusion portion of the recombinant 22 fusion protein identified a protein of the appropriate size, namely an about 31-kD protein. Flea serine protease protein PfSP20 22 was purified by nickel chelation chromatography followed by reverse phase HPLC.

Immunoblot analysis of the purified PfSP20 22 2 indicated that rabbit anti-flea protease antiserum, produced as described in example 14 of related PCT Publication No. WO 96/11706, selectively bound to PfSP20 22 2.

Example 19 This example describes that various flea serine protease nucleic acid molecules described in the foregoing examples can be obtained from multiple sources.

Nucleic acid molecules corresponding to flea clone 4 have been obtained from a bovine blood-fed whole flea library (described in Example 8 of related PCT Publication No. WO 96/11706), a cat blood-fed whole flea library (described in Example an unfed whole flea library (described in Example 8 of related PCT Publication No. WO 96/11706), and a mixed instar whole flea library (described in Example Nucleic acid molecules corresponding to flea clone 5 have been obtained from a bovine blood-fed whole flea library and a cat blood-fed whole flea library.

Nucleic acid molecules corresponding to flea clone 6 have been obtained from a bovine blood-fed whole flea library, WO 97/40058 PCTUS97106121 128 a cat blood-fed whole flea library and an unfed whole flea library. Nucleic acid molecule corresponding to flea clone 7 have been obtained from a bovine blood-fed whole flea library, and a cat blood-fed whole flea library. Nucleic acid molecules corresponding to flea clone 8 have been obtained from a bovine blood-fed whole flea library and an unfed whole flea library. Nucleic acid molecules corresponding to flea clone 12 have been obtained from a bovine blood-fed whole flea library and a cat blood-fed whole flea library. Nucleic acid molecules corresponding to flea clone 13 have been obtained from a bovine blood-fed whole flea library, a cat blood-fed whole flea library, and an unfed whole flea library. Nucleic acid molecules corresponding to flea clone 20 have been obtained from a bovine blood-fed whole flea library, a cat blood-fed whole flea library, and an unfed whole flea library. Nucleic acid molecules corresponding to flea clone 28 have been obtained from a bovine blood-fed whole flea library and a cat blood-fed whole flea library.

Example This example provides additional nucleic acid and deduced amino acid sequences of nucleic acid molecules encoding a flea cysteine protease protein of the present which was described in Example 3. This example also WO 97/40058 PCT/US97/06121 129 provides the production of a cysteine protease protein in E. coli cells.

A. Additional Cysteine Protease Nucleic Acid Molecule The PCR products described in Example 3 were submitted to additional nucleic acid sequence analysis in order to obtain the nucleic acid sequence of additional portions of the coding region of the cysteine protease gene. A composite nucleic acid sequence representing a flea cysteine protease coding region, referred to herein as nfCPluo 09 was deduced and is denoted herein as SEQ ID NO:1.

SEQ ID NO:76 is contained within the sequence of the nucleic acid molecule nfCPlo09. Translation of SEQ ID NO:1 suggests that nucleic acid molecule nfCPu1, 0 9 encodes a fulllength flea cysteine protease protein of about 327 amino acids, referred to herein as PfCPl 327 having amino acid sequence SEQ ID NO:2, assuming an open reading frame in which the initiation codon spans from about nucleotide 126 through about nucleotide 128 of SEQ ID NO:1 and the termination codon spans from about nucleotide 1107 through about nucleotide 1109 of SEQ ID NO:1. The complement of SEQ ID NO:1 is represented herein by SEQ ID NO:3. The coding region encoding PfCP1 32 7 is represented by nucleic acid molecule nfCP1,, 4 having a coding strand with the nucleic acid sequence represented by SEQ ID NO:4 and a complementary strand with nucleic acid sequence SEQ ID WO 97/40058 PCT/US97/06121 130 NO:6. The proposed mature protein, denoted herein as PfCP1 22 6, contains about 226 amino acids which is represented herein as SEQ ID NO:8. The nucleic acid molecule encoding PfCP1, 22 is denoted herein as nfCP1 681 which is represented by SEQ ID NO:7. The amino acid sequence of PfCP 32 7 SEQ ID NO:2) predicts that PfCP1 3 2 has an estimated molecular weight of about 42 kD and an estimated pi of about pi 8.84.

Comparison of nucleic acid sequence SEQ ID NO:1 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:1 showed the most homology, about identity, with the following three genes: a Drosophila cysteine protease gene, a Bombyx cysteine protease gene and a Sarcophaga cysteine protease gene. Comparison of amino acid sequence SEQ ID NO:2 the amino acid sequence of PfCP 3 2 7) with amino acid sequences reported in GenBank indicates that SEQ ID NO:2 showed the most homology, i.e., about 42% identity, with the following three proteins: a Drosophila cysteine protease protein, a Bombyx cysteine protease protein and a Sarcophaga cysteine protease protein.

B. Production of Cysteine Protease Protein in E.

coli cells.

An about 660-bp nucleic acid molecule, referred to herein as nfCP1 660 (designed to encode an apparently mature cysteine protease protein) was PCR-amplified from a flea WO 97/40058 PCTfUS97/06121 131.

mixed instar cDNA library produced using unfed 1st instar, bovine blood-fed 1st instar, bovine blood-fed 2 nd instar and bovine blood-fed 3 rd instar flea larvae (this combination of tissues is referred to herein as mixed instar larval tissues for purposes of this example). Total RNA was extracted from mixed instar tissue using an acidguanidinium-phenol-chloroform method similar to that described by Chomczynski et al., 1987, Anal. Biochem. 162, p. 156-159. Approximately 5,164 mixed instar larvae were used in each RNA preparation. Poly A+ selected RNA was separated from each total RNA preparation by oligo-dT cellulose chromatography using Poly(A)Quick® mRNA isolation kits (available from Stratagene Cloning Systems, La Jolla, CA), according to the method recommended by the manufacturer. A mixed instar cDNA expression library was constructed in lambda Uni-ZAPmXR vector (available from Stratagene Cloning Systems) using Stratagene's ZAP-cDNA Synthesis Kit® protocol. About 6.34 pg of mixed instar poly A+ RNA were used to produce the mixed instar library.

The resultant mixed instar library was amplified to a titer of about 2.17 x 10 10 pfu/ml with about 97% recombinants.

The primers used in the PCR amplification were sense primer CysBS' having the nucleotide sequence 5' GAT AAG GAT CCG TTA CCA GAT TCT TTC GAC TGG 3' (containing a BamHI-site; denoted SEQ ID NO:64) and anti-sense primer CysHA having the nucleotide sequence 5' TTA TCA AGC TTC CAT TTA CAT GCC WO 97/40058 PCTIUS97/06121.

132 GTA AAA ATC 3' (containing a HindIII site; denoted SEQ ID The resulting PCR product nfCPl, was submitted to nucleic acid sequence analysis to obtain a nucleic acid sequence of the coding strand, represented herein as SEQ ID NO:94. Translation of SEQ ID NO:94 indicated that nfCPlG 9 encodes a protein of about 220 amino acids, called PfCP1 2 2,, having SEQ ID NO:95. It is to be noted that this sequence analysis indicated that the stop codon was actually about 36 base pairs upstream from what had been predicted by SEQ ID NO:1; as such, the protein encoded by nfCP1g is about 12 amino acids shorter than would have been predicted by SEQ ID NO:1. The nucleic acid molecule nfCPl1 6 6 contains the coding region for PfCP1 220 Recombinant cell E. coli:pCro-nfCP1 6 ,o was produced in the following manner. Nucleic acid molecule nfCP1,,o was digested with BamHI and HindIII restriction endonucleases, gel purified, and subcloned into expression vector lambdaPR/T 2 ori/S1OHIS-RSET-A9 (the production of which is described in Tripp et al, International PCT Publication No.

WO 95/24198, published September 14, 1995; see in particular, Example that was digested with BamHI and HindIII and dephosphorylated. The resultant recombinant molecule, referred to herein as pCro-nfCP1 6 0o, was transformed into E. coli BL-21 competent cells (available from Novagen, Madison, WI) to form recombinant cell E.

coli:pCro-nfCP166o. The recombinant cell was cultured as WO 97/40058 PCT/US97/06121 133 described in Example 20 of related PCT Publication No. WO 95/24198. About 1 ml of culture was collected prior to induction, and about 1 ml of culture was collected about minutes following induction. These samples were then lysed in sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) loading buffer, resolved on a 14% Tris-glycine acrylamide gel and analyzed by immunoblot using a T7 (tag) antibody (available from Novagen).

Example 21 This example provides additional nucleic acid and deduced amino acid sequences of nucleic acid molecules encoding serine protease proteins of the present invention which are described herein and in the Examples section of related PCT Publication No.WO 96/11706.

A. A DNA probe labeled with 32 P comprising nucleotides from nfAP2 210 o (described in Example 23 of related U.S. Patent Application Serial No. 08/639,075, filed April 24, 1996) was used to screen a bovine blood-fed whole flea cDNA library (described in Example 8 of related PCT Publication No.WO 96/11706) using standard hybridization techniques. A clone was isolated having about a 459-nucleotide insert, referred to herein as nfSP184, 0 A nucleic acid sequence of the composite nucleic acid molecule produced using nucleic acid sequence from nfSP18, 3 WO 97/40058 PCT/US97/06121- 134 and nfSP18 4 59 is referred to herein as nfSP18 having a nucleic acid sequence of the coding strand which is denoted herein as SEQ ID NO:9. Translation of SEQ ID NO:9 suggests that nucleic acid molecule nfSP18,. encodes a non-fulllength flea serine protease protein of about 228 amino acids, referred to herein as PfSP18 22 8 having amino acid sequence SEQ ID NO:10, assuming the first codon spans from about nucleotide 1 through about nucleotide 3 of SEQ ID NO:9 and the stop codon spans from about nucleotide 685 through about nucleotide 687 of SEQ ID NO:9. The complement of SEQ ID NO:9 is represented herein by SEQ ID NO:11. The coding region encoding PfSP18 228 is represented by nucleic acid molecule nfSP18 2 2 having a coding strand with the nucleic acid sequence represented by SEQ ID NO:12 and a complementary strand with nucleic acid sequence SEQ ID NO:14. The amino acid sequence of PfSP18, 2 8 SEQ ID predicts that PfSP18 22 e has an estimated molecular weight of about 25 kD and an estimated pi of about 9.09.

Comparison of nucleic acid sequence SEQ ID NO:9 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:9 showed the most homology, about 51% identity, between SEQ ID NO:9 and an Anopheles stephensi trypsin 1 gene. Comparison of amino acid sequence SEQ ID the amino acid sequence of PfSP18 22 with amino acid sequences reported in GenBank indicates that SEQ ID WO 97/40058 PCT/US97/06121 135 showed the most homology, about 59. identity between SEQ ID NO:10 and Vespa crabro protein.

B. The remainder of flea serine protease nucleic molecule clone 24 (described in Example 6 was determined using primers designed from nfSP24 41 0 to amplify DNA from the bovine blood-fed whole flea cDNA library. Sense primer Flea 24F having the nucleotide sequence 5' GGA CAA ACT GTT CAT TGC AG 3' (denoted SEQ ID NO:46) was used in combination with the M13 universal primer in a first PCR reaction. Anti-sense primer Flea 24R having the nucleotide sequence 5' CCC TCA TTT GTC GTA ACT CC 3' (denoted SEQ ID NO:47) was used in combination with the M13 reverse primer in a second PCR reaction. The resulting PCR products were each gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques.

A composite nucleic acid sequence representing a flea serine protease coding region was deduced, referred to herein as nfSP24i 0 89, was deduced and is denoted herein as SEQ ID NO:15. SEQ ID NO:78 is contained within the sequence of the nucleic acid molecule nfSP24 1089 Translation of SEQ ID NO:15 suggests that nucleic acid molecule nfSP24 1 089 encodes a full-length flea serine protease protein of about 258 amino acids, referred to herein as PfSP24,5,, having amino acid sequence SEQ ID NO:16, assuming an open reading frame in which the initiation codon spans from about nucleotide 33 through about WO 97/40058 PCT/US97/06121 136 nucleotide 35 of SEQ ID NO:15 and the termination codon spans from about nucleotide 807 through about nucleotide 809 of SEQ ID NO:15. The complement of SEQ ID NO:15 is represented herein by SEQ ID NO:17. The coding region encoding PfSP242 5 8 is represented by nucleic acid molecule nfSP24774, having a coding strand with the nucleic acid sequence represented by SEQ ID NO:18 and a complementary strand with nucleic acid sequence SEQ ID NO:20. The proposed mature protein, denoted herein as PfSP24 23 contains about 237 amino acids which is represented herein as SEQ ID NO:22. The nucleic acid molecule encoding PfSP24 23 7 is denoted herein as nfSP2471i, which is represented by SEQ ID NO:21. The amino acid sequence of PfSP24 258 SEQ ID NO:16) predicts that PfSP24 258 has an estimated molecular weight of about 28 kD and an estimated pi of about pi 6.70.

Comparison of nucleic acid sequence SEQ ID NO:15 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:15 showed the most homology, about 51% identity between SEQ ID NO:15 and an Anopheles stephensi trypsin 1 gene. Comparison of amino acid sequence SEQ ID NO:16 the amino acid sequence of PfSP24 25 8 with amino acid sequences reported in GenBank indicates that SEQ ID NO:16 showed the most homology, about 59% identity between SEQ ID NO:16 and an Anopheles gambiae chymotrypsin II protein.

WO 97/40058 PCTIUS97/06121.

137 C. The remainder of flea serine protease nucleic molecule clone 32 (described in Example 20 was determined using primers designed from nfSP32433 to amplify DNA from the cat-fed whole flea cDNA library. Sense primer Flea 32F having the nucleotide sequence 5' GGC TAG GTT AGT GGA TTC TGG 3' (denoted SEQ ID NO:48) was used in combination with the M13 universal primer in a first PCR reaction. Antisense primer Flea 32R having the nucleotide sequence 5' GCA AAT CAG TTC CAG AAT CCA CTA ACC 3' (denoted SEQ ID NO:49) was used in combination with the M13 reverse primer in a second PCR reaction. The resulting PCR products were each gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques.

A composite nucleic acid sequence representing a flea serine protease coding region was deduced, referred to herein as nfSP32 9 24 was deduced and is denoted herein as SEQ ID NO:23. SEQ ID NO:80 is contained within the sequence of the nucleic acid molecule nfSP32 92 4. Translation of SEQ ID NO:23 suggests that nucleic acid molecule nfSP32 92 4 encodes a full-length flea serine protease protein of about 268 amino acids, referred to herein as PfSP32 268 having amino acid sequence SEQ ID NO:24, assuming an open reading frame in which the initiation codon spans from about nucleotide 6 through about nucleotide 8 of SEQ ID NO:23 and the termination codon spans from about nucleotide 810 through about nucleotide 812 of SEQ ID NO:23. The complement of WO 97/40058 PCT/US97/06121.

138 SEQ ID NO:23 is represented herein by SEQ ID NO:25. The coding region encoding PfSP32 2 6 8 is represented by nucleic acid molecule nfSP32 699 having a coding strand with the nucleic acid sequence represented by SEQ ID NO:26 and a complementary strand with nucleic acid sequence SEQ ID NO:28. The amino acid sequence of PfSP32 268 SEQ ID NO:24) predicts that PfSP32 268 has an estimated molecular weight of about 28.6 kD and an estimated pi of about pi 7.36.

Comparison of nucleic acid sequence SEQ ID NO:23 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:23 showed the most homology, about 52% identity between SEQ ID NO:23 and a Fusarium oxysporum preprotrypsin gene. Comparison of amino acid sequence SEQ ID NO:24 the amino acid sequence of PfSP32, 26 with amino acid sequences reported in GenBank indicates that SEQ ID NO:24 showed the most homology, about 63% identity between SEQ ID NO:24 and a Bombyx mori vitellin -degrading protease precursor protein.

D. The remainder of flea serine protease nucleic molecule clone 33 was determined using primers designed from nfSP33-, to amplify DNA from the flea mixed instar larvae cDNA library described above in Example 20. Sense primer Flea 33F having the nucleotide sequence 5' CAG GGC GCT CTG CAG AAC GCA AC 3' (denoted SEQ ID NO:50) was used in combination with the M13 universal primer in a first PCR WO 97/40058 PCT/US97/06121.

139 reaction. Anti-sense primer Flea 33R having the nucleotide sequence 5' ATT CCT CGT GGT TCA GTC GCT C 3' (denoted SEQ ID NO:51) was used in combination with the M13 reverse primer in a second PCR reaction. The resulting PCR products were each gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques.

A composite nucleic acid sequence representing a flea serine protease coding region was deduced, referred to herein as nfSP33, 94 was deduced and is denoted herein as SEQ ID NO:29. SEQ ID NO:84 and SEQ ID NO:82 are contained within the sequence of the nucleic acid molecule nfSP33, 194 Translation of SEQ ID NO:29 suggests that nucleic acid molecule nfSP331894 encodes a full-length flea serine protease protein of about 400 amino acids, referred to herein as PfSP33 400 having amino acid sequence SEQ ID assuming an open reading frame in which the initiation codon spans from about nucleotide 335 through about nucleotide 337 of SEQ ID NO:29 and the termination codon spans from about nucleotide 1535 through about nucleotide 1537 of SEQ ID NO:29. The complement of SEQ ID NO:29 is represented herein by SEQ ID NO:31. The coding region encoding PfSP33 400 is represented by nucleic acid molecule nfSP33 1200 having a coding strand with the nucleic acid sequence represented by SEQ ID NO:32 and a complementary strand with nucleic acid sequence SEQ ID NO:34. The WO 97/40058 PCT/US97/06121.

140 proposed mature protein, denoted herein as PfSP33,,., contains about 242 amino acids which is represented herein as SEQ ID NO:36. The nucleic acid molecule encoding PfSP33,,2 is denoted herein as nfSP33 72 6 which is represented by SEQ ID NO:35. The amino acid sequence of PfSP3340o SEQ ID NO:30) predicts that PfSP33 4 00 has an estimated molecular weight of about 44 kD and an estimated pi of about pi 7.59.

Comparison of nucleic acid sequence SEQ ID NO:29 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:29 showed the most homology, about 48% identity between SEQ ID NO:29 and a Drosophila melanogaster serine protease stubble gene. Comparison of amino acid sequence SEQ ID NO:30 the amino acid sequence of PfSP33 400 with amino acid sequences reported in GenBank indicates that SEQ ID NO:30 showed the most homology, i.e., about 63% identity between SEQ ID NO:30 and a Drosophila melanogaster serine protease stubble protein.

Example 22 This example provides nucleic acid and deduced amino acid sequence of another nucleic acid molecule encoding a serine protease protein of the present invention.

A serine protease cDNA nucleic acid molecules has been isolated in a manner similar to that described in Example 8 of related PCT Publication No.No.WO 96/11706. The actual WO 97/40058 PCT/US97/06121 141 primers used in PCR amplification of the serine protease nucleic acid molecule from a cat blood-fed flea cDNA expression library (produced as described in Example 8 of related PCT Publication No.No.WO 96/11706) included cat-try #2 (SEQ ID NO:86) in combination with H57 primer (SEQ ID NO:99). The resultant PCR product was gel purified and cloned into the TA Vector

M

A recombinant TA vector clone was isolated and subjected to nucleic acid sequencing.

A composite nucleic acid sequence of a flea serine protease nucleic molecule corresponding to flea clone namely nfSP40, 28 was deduced and is denoted herein as SEQ ID NO:37. Translation of SEQ ID NO:37 suggests that nucleic acid molecule nfSP40 4 encodes a non-full-length flea serine protease protein of about 142 amino acids, referred to herein as PfSP40 142 represented herein by SEQ ID NO:38. The complement of SEQ ID NO:37 is represented herein by SEQ ID NO:39.

The remainder of flea serine protease nucleic molecule clone 40 was determined using primers designed from nfSP40 428 to amplify DNA from the cat blood-fed whole flea cDNA library. Sense primer Flea 40F having the nucleotide sequence 5' GGC AAG TTT CGT TTC ACA ATA GG 3' (denoted SEQ ID NO:52) was used in combination with the M13 universal primer in a first PCR reaction. Anti-sense primer Flea having the nucleotide sequence 5' TCC AAC CCT AAC TTT TAA ACC TTC 3' (denoted SEQ ID NO:53) was used in combination WO 97/40058 PCT/US97/06121 142 with the M13 reverse primer in a second PCR reaction. The resulting PCR products were each gel purified and cloned into the TA Vector® System, and subjected to standard DNA sequencing techniques.

A composite nucleic acid sequence representing a flea serine protease coding region was deduced, referred to herein as nfSP40 84 was deduced and is denoted herein as SEQ ID NO:40. SEQ ID NO:37 is contained within the sequence of the nucleic acid molecule nfSP40 841 Translation of SEQ ID NO:40 suggests that nucleic acid molecule nfSP40 4 encodes a non-full-length flea serine protease protein of about 242 amino acids, referred to herein as PfSP40 242 having amino acid sequence SEQ ID NO:41, assuming an open reading frame in which the first codon spans from about nucleotide 2 through about nucleotide 4 of SEQ ID NO:40 and the termination codon spans from about nucleotide 728 through about nucleotide 730 of SEQ ID NO:40. The complement of SEQ ID NO:40 is represented herein by SEQ ID NO:42. The coding region encoding PfSP40 242 is represented by nucleic acid molecule nfSP40 717 having a coding strand with the nucleic acid sequence represented by SEQ ID NO:43 and a complementary strand with nucleic acid sequence SEQ ID NO:45. The amino acid sequence of PfSP40 242

SEQ

ID NO:41) predicts that PfSP40 24 has an estimated molecular weight of about 26 kD and an estimated pi of about pi WO 97/40058 PCTIUJS97/06IZ1 143 Comparison of nucleic acid sequence SEQ ID NO:40 with nucleic acid sequences reported in GenBank indicates that SEQ ID NO:40 showed the most homology, about 57% identity between SEQ ID NO:40 and a Dermatophagoides pteronyssinus Der P3 allergen gene. Comparison of amino acid sequence SEQ ID NO:41 the amino acid sequence of PfSP4024 2 with amino acid sequences reported in GenBank indicates that SEQ ID NO:41 showed the most homology, i.e., about 40% identity between SEQ ID NO:41 and a Bombyx mori vitellin-degrading protease precursor protein.

Example 23 This Example demonstrates the production of serine protease proteins of the present invention in E. coli cells.

A. Flea serine protease protein PfSP24 258 was produced in the following manner. An about 714 bp nucleic acid molecule, referred to herein as nfSP24 714 (designed to encode an apparently mature serine protease protein) was PCR amplified from nfSP24 089 using sense primer Flea 24 EF having the nucleotide sequence 5' CAC AGG ATC CAA TAA TTT GTG GTC AAA ATG C 3' (containing a BamHI-site; denoted SEQ ID NO:54) and anti-sense primer Flea 24 ER having the nucleotide sequence 5' AAA AAG AAA GCT TCT TTA ATT TTC TGA CAT TGT CGT G 3' (containing a HindIII; denoted SEQ ID NO:55). The resulting PCR product nfSP24n 4 was digested WO 97/40058 PCT/US97/06121.

144 with BamHI and HindIII restriction endonucleases, gel purified, and subcloned into expression vector lambdaPR/T 2 ori/S1OHIS-RSET-A9, that had been digested with BamHI and HindIII and dephosphorylated. The resultant recombinant molecule, referred to herein as pCro-nfSP24- 4 was transformed into E. coli BL-21 competent cells (available from Novagen, Madison, WI) to form recombinant cell E. coli:pCro-nfSP24 7 4 The recombinant cell was cultured as described in Example 20 of related PCT Publication No.WO 95/24198. About 1 ml of culture was collected prior to induction, and about 1 ml of culture was collected about 60 minutes following induction. These samples were then lysed in sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) loading buffer and resolved on a 14% Tris-glycine acrylamide gel.

Immunoblot analysis of the proteins using a T7 (tag) antibody (available from Novagen) showed expression of an about 36 kD protein in the induced sample but not in the uninduced sample.

B. Flea serine protease protein PfSP32 2 6 8 was produced in the following manner. An about 698 bp nucleic acid molecule, referred to herein as nfSP32 698 (designed to encode an apparently mature serine protease protein) was PCR amplified from nfSP32 933 using sense primer Flea 32 EF having the nucleotide sequence 5' GCG GGA TCC TAT TGT GGG TGG TGA AGC AGT G 3' (containing a BamHI-site; denoted SEQ ID WO 97/40058 PCT/US97/06121.

145 NO:56) and anti-sense primer Flea 32 ER having the nucleotide sequence 5' GAC GGT ACC ATG TAT AAA ATA ATA TTA AAC TCC GG 3' (containing a KpnI; denoted SEQ ID NO:57).

The resulting PCR product nfSP32 69 was digested with BamHI and KpnI restriction endonucleases, gel purified, and subcloned into expression vector 1 pTrcHisB (available from InVitrogen Corp., San Diego, CA), that had been digested with BamHI and KpnI and dephosphorylated. The resultant recombinant molecule, referred to herein as pTrc-nfSP32 69 was transformed into E. coli BL-21 competent cells to form recombinant cell E. coli:pTrc-nfSP32, 8 The recombinant cell was cultured and protein production resolved by SDS- PAGE as described above in Section A. Immunoblot analysis of the proteins using a T7 antibody showed expression of an about 38 kD protein in the induced sample but not in the uninduced sample.

C. Flea serine protease protein PfSP33 400 was produced in the following manner. An about 1200 bp nucleic acid molecule, referred to herein as nfSP33 200 (designed to encode an apparently mature serine protease protein) was PCR amplified from nfSP33, 1 94 using sense primer Flea 33 EF having the nucleotide sequence 5' CCG GGA TCC TAT GTT AGC GAT CGT CCC GTC AAA C 3' (containing a BamHI-site; denoted SEQ ID NO:58) and anti-sense primer Flea 33 ER having the nucleotide sequence 5' CCG GAA TTC TTA TCC CAT TAC TTT GTC GAT CC 3' (containing a EcoRI; denoted SEQ ID NO:59) The WO 97/40058 PCT/US97/06121 146 resulting PCR product nfSP33 1 200 was digested with BamHI and EcoRI restriction endonucleases, gel purified, and subcloned into expression vector lambdaPR/T 2 ori/SlOHIS-RSET- A9, that had been digested with BamHI and EcoRI and dephosphorylated. The resultant recombinant molecule, referred to herein as pCro-nfSP33 1200 was transformed into E. coli BL-21 competent cells to form recombinant cell E.

coli:pCro-nfSP33 1200 The recombinant cell was cultured using the method described above in Section A.

D. Flea serine protease protein PfSP40 24 2 was produced in the following manner. An about 716 bp nucleic acid molecule, referred to herein as nfSP40 716 (designed to encode an apparently mature serine protease protein) was PCR amplified from nfSP40 41 using sense primer Flea 40 EF having the nucleotide sequence 5' GCG GGA TCC AAT AGT AGG AGG TGA AGA TGT AG 3' (containing a BamHI-site; denoted SEQ ID and anti-sense primer Flea 40 ER having the nucleotide sequence 5' CCG GAA TTC TTC TAA CAA ATT TTA TTT GAT TCC TGC 3' (containing a EcoRI; denoted SEQ ID NO:61).

The resulting PCR product nfSP40,71 was digested with BamHI and EcoRI restriction endonucleases, gel purified, and subcloned into expression vector lambdaPR/Tori/S1OHIS-RSET- A9, that had been digested with BamHI and EcoRI and dephosphorylated. The resultant recombinant molecule, referred to herein as pCro-nfSP40,6, was transformed into E. coli BL-21 competent cells to form recombinant cell E.

WO 97/40058 PCT/US97/06121.

147 The recombinant cell was cultured and protein production resolved using the methods described above in Section A. Immunoblot analysis of the proteins using a T7 antibody showed expression of an about 38 kD protein in the induced sample but not in the uninduced sample.

Example 24 This Example demonstrates the production of another serine protease protein of the present invention in E. coli cells.

A. Flea serine protease protein PfSP28 23 was produced in the following manner. An about 711 bp nucleic acid molecule, referred to herein as nfSP28,71 (designed to encode an apparently mature serine protease protein) was PCR amplified from nfSP28 923 using sense primer Flea 28 F having the nucleotide sequence 5' GGA TCC AAT CGT TGG AGG TGA AGA TG 3' (containing a BamHI-site shown in bold; denoted SEQ ID NO:62) and anti-sense primer Flea 28 R having the nucleotide sequence 5' GAA TTC GAA ATC CAC TTA AAC ATT AGC 3' (containing a EcoRI shown in bold; denoted SEQ ID NO:63). The resulting PCR product nfSP287 was digested with BamHI and EcoRI restriction endonucleases, gel purified, and subcloned into expression vector lambdaPR/T 2 ori/S1OHIS-RSET-A9, that had been digested with BamHI and XbaI and dephosphorylated. The resultant PCTUS97/06121.

148 dombinant molecule, referred to herein as pCro-nfSP287,, was transformed into E. coli BL-21 competent cells (available from Novagen, Madison, WI) to form recombinant cell E. coli:pCro-nfSP287 1 The. recombinant cell was cultured and protein production resolved using the methods described above in Example 20. Immunoblot analysis of the proteins using a T7 antibody showed expression of an about 38 kD protein in the induced sample but not in the uninduced sample. Immunoblot analysis using a rabbit antiflea midgut protease polyclonal antibody (the production of which is described in Example 14 of related PCT Publication NO.WO 95/24198) identified an about 38 kD protein in the induced sample.

Example This Example demonstrates the production of another serine protease protein of the present invention in eukaryotic cells.

Recombinant molecule pBv-nfSP28 792 containing a flea serine protease nucleic acid molecule spanning nucleotides from about 11 through about 802 of SEQ ID NO:66, operatively linked to baculovirus polyhedron transcription control sequences were produced in the following manner.

In order to subclone a flea serine protease nucleic acid molecule into a baculovirus expression vector, a flea serine protease nucleic acid molecule-containing fragment WO 97/40058 PCT/US97/06121 149 was, CR amplified from nfSP28, 23 A PCR fragment of 792 nucleotides, named nfSP28 9 was amplified from nfSP28,- Susing a sense primer Flea 28 F3 having the nucleic acid sequence 5'-GCG GGA TTC TAT AAA TAT GAA ACT TTT GGT AGT TTT TGC (SEQ ID NO:62; BamHI site shown in bold) and an antisense primer Flea 28 R3 having the nucleic acid sequence 5'-GCT CTA GAC CAC TTA AAC ATT AGC ATA TTT TTC- 3' (SEQ ID NO:63; XbaI site shown in bold). The N-terminal primer was designed from the pol h sequence of baculovirus with modifications to enhance expression in the baculovirus system.

In order to produce a baculovirus recombinant molecule capable of directing the production of PfSP28 2 6 4 the about 792 base pair PCR product (referred to as Bv-nfSP28, 92 was digested with BamHI and XbaI and subcloned into BamHI and XbaI digested to produce the recombinant molecule referred to herein as pVL-nfSP28, 9 The resultant recombinant molecule, pVL-nfSP28 792 was verified for proper insert orientation by restriction mapping. The recombinant molecule was co-transfected with a linear Baculogold baculovirus DNA (available from Pharmingen) into S. frugiperda Sf9 cells (available from InVitrogen) to form the recombinant cells denoted S.

frugiperda:pVL-nfSP287,9. S. frugiperda:pVL-nfSP28 792 was cultured in order to produce a flea serine protease protein PfSP28264.

WO 97/40058 PCT/US97/06121 150 Immunoblots of supernatants from cultures of S.

frugiperda:pVL-nfSP28, 92 cells producing the flea serine protease protein PfSP282, 4 was performed using a cat antifSPFlea 28 polyclonal antibody which was produced as follows. Recombinant Flea 28 protein (referred to herein as rSPFlea 28 protein) produced in E. coli described above in Example 24 was used to immunize cats. The rSPFlea 28 protein was diluted to a concentration of about 1 mg/ml in PBS and emulsified in an equal volume of TiterMax research adjuvant (available from CytRx Corp., Norcross, GA). A series of cats were immunized each with about 50pg of rSPFlea 28 protein in adjuvant by subcutaneous injection.

A second injection of the same dose of rSPFlea 28 protein in adjuvant was administered 32 days later. Blood samples were obtained prior to immunization (pre-bleed), 32 days and 47 days after the initial immunization. Sera samples from the pre-immunization and Day 47 bleeds were used for subsequent immunoblot experiments. The latter is referred to as anti-fSPFlea 28 polyclonal antibody.

Analysis of the immunoblots identified an about 33 kD protein and an about 36 kD protein.

Example 26 This example describes the production of peptides from the 31 kD flea midgut serine protease and the generation of internal sequence data.

WO 97/40058 PCT/US97/06121 151 Midguts from about 30,000 cat blood-fed fleas were dissected as described in U.S. Patent No. 5,356,622, ibid.

in gut dissection buffer (50 mM Tris 8.0, 100 mM CaC1,). The guts (in three batches of about 10,000 each) were disrupted by a freeze-thaw cycle, followed by sonication. The resulting extracts were clarified by centrifugation for minutes at 3050 rpm in a swinging bucket centrifuge at 4°C.

The supernatants were recovered, and adjusted to 400 mM NaCl in preparation for benzamidine column chromatography.

For each batch, gut supernatants were loaded into a ml disposable column containing p-aminobenzamidine crosslinked to Sepharose beads (available from Sigma, St. Louis, MO), previously equilibrated in benzamidine column buffer mM Tris, pH 8.0, 100 mM CaCl 2 400 mM NaCl) and incubated with rocking overnight at 4°C. Unbound protein was slowly washed off the column using benzamidine column buffer until no protein was detectable by Bradford Assay (available from Bio-Rad Laboratories, Hercules, CA).

Proteases bound to the benzamidine column were eluted using 4 ml benzamidine column buffer supplemented with 100 mM p-aminobenzamidine (brought to pH 8.0 with NaOH).

Residual bound proteases were washed off with about 21 ml of additional benzamidine column buffer. The recovered proteases were then concentrated to a volume of about 2 ml using a Ultrafree 20 10-kD centrifugal concentrator (available from Millipore, Bedford, MA). After WO 97/40058 PCT/US97/06121.

152 concentration, the protease pools from the 3 preparations were combined for a total of about 30,000 gut equivalents in about 6 ml. Protein concentration was measured by Bradford assay and found to be about 0.5 mg/ml.

About 150 pg of the isolated protease pool was resolved by polyacrylamide gel electrophoresis (PAGE) on a preparative-well 14% Tris-glycine gel (available from Novex, San Diego, CA). After electrophoresis, the proteins in the gel were visualized by staining for about 30 minutes in Coomassie brilliant blue stain Coomassie blue R, 40% methanol, 10% acetic acid) and destaining for about 2.5 hours in 50% methanol. The band corresponding to the 31-kD protease was excised with a razor blade. The protein was electroeluted, concentrated, and partially digested for 24 hours with cyanogen bromide (CNBr) (Silver, et al., 1995, J. Biol.

Chem., 270, 13010-13016), except that a small amount of acetic acid was added to the sample after electroelution and concentration to lower the sample pH and therefore reduce autodigestion by the 31-kD protease. CNBr is known to cleave after methionine residues under the conditions used for this digestion. After CNBr digestion, the peptides in the sample were resolved by PAGE on an 18% Tris-glycine gel. After electrophoresis, the separated protease peptides were electroblotted onto a PVDF membrane using a CAPS buffer (10 mM CAPS pH 11, 0.5 mM DTT, WO 97/40058 PCT/US97/06121.

153 methanol). The membrane was stained with Coomassie Brilliant Blue and destained with 50% methanol.

Three stained peptide bands were identified having apparent molecular weights of about 14 kD, 21 kD, and 22 kD, respectively. The portions of the membrane containing the 21 kD and 22 kD bands were excised separately. Peptides contained in each membrane segment were subjected to Nterminal amino acid sequencing using a 473A Protein Sequencer (available from Applied Biosystems, Foster City, CA) using standard techniques.

Although the results from the automated sequencing were difficult to interpret due to overlapping sequences, analysis of the chromatograms indicated the N-terminal amino acid sequence of the 21-kD peptide to be H/R-V/P- G/A/S-Y/G-E/N-D/K-V/R-D/A-D-Y- D-F-D/P-V-A, denoted herein as SEQ ID NO:70 and the N-terminal amino acid sequence of the 22-kD peptide to be I/Q-V-G-Y/G-E/N/T-D/M/P-V-K/D-I- N/S-M/T/N-F/C herein denoted as SEQ ID NO:71. The Nterminal amino acid sequence of the intact 31-kD protease is either I-V-G-G- E-D-V-D-I-S-T-C-G-W-C (SEQ ID NO:59, as disclosed in Example 34 in co-pending U.S. Patent Application Serial No. 08/639,075), or IVGGEDVDIST(C)GWQI(S)FQ(S)ENLHF(C)GG(S)IIAPK (SEQ ID NO:69, as disclosed in Example 35 in co-pending U.S. Patent Application Serial No. 08/639,075). These sequences vary at residue 15 in that SEQ ID NO:68 contains a cysteine and WO 97/40058 PCT/US97/06121.

154 SEQ ID NO:69 contains a glutamine. These sequences can be identified in the sequences of both the 21-kD (SEQ ID and 22-kD (SEQ ID NO:71) peptides, though it is much stronger in the 22-kD peptide, leading to the conclusion that the SEQ ID NO:71 represents the N-terminus of the 31kD protease. If this sequence is subtracted from the sequence for the 21-kD (SEQ ID NO:70) peptide, then the resulting sequence for the 21-kD peptide is H/R-P-A/S-Y-N- K-R-A-D-Y-D-F-D-V-A, denoted herein as SEQ ID NO:72. This sequence of amino acids aligns with a stretch of deduced amino acids from about residue 107 to residue 121 immediately following a methionine residue in SEQ ID NO:67.

These data confirm that the clone represented by nfSP28 923 (SEQ ID NO:66, as disclosed in Example 36 in co-pending U.S. Patent Application Serial No. 08/639,075) indeed encodes the 31-kD protease.

Example 27 This example demonstrates that a 31-kD flea midgut serine protease contained in a formulation is able to proteolyze cat immunoglobulin G, A, and M proteins as well as bovine, dog, human, and rabbit immunoglobulin G proteins.

The 31-kD flea midgut serine protease was purified from cat blood-fed fleas as follows. Cat blood-fed flea midgut extracts were prepared and selected on a benzamidine WO 97/40058 PCT/US97/06121.

155 column as described above in Example 26. The benzamidine eluate was then further purified as described in Example of co-pending U.S. Patent Application Serial No.

08/639,075 by PolyCAT A cation exchange chromatography (available from PolyLC, Inc., Columbia, MD) to isolate a protein band which migrated at about 31 kD on a silver stained SDS-PAGE gel.

A. The ability of the cat blood-fed 31-kD flea midgut serine protease to degrade immunoglobulin was demonstrated by measuring digestion of immunoglobulin heavy chain using a method similar to that described in Example 35 of copending U.S. Patent Application Serial No. 08/639,075.

Specifically, 1 pg samples of cat IgG, cat IgA, and cat IgM substrates (available from Bethyl Laboratories, Inc., Montgomery, Texas) were incubated with 500 cat blood-fed flea midgut equivalents of purified 31-kD flea midgut serine protease in a total volume of 27 il 0.1M Tris-HCl pH at 37 0 C for 18 hours. The reaction mixtures were resolved by 14% Tris-glycine SDS-PAGE and the gel was silver stained using standard methods. The total disappearance (compared with control samples lacking addition of the purified 31-kD protein) of bands migrating at about 50, 60, and 80 kD on the silver stained gel in the lanes containing 31-kD protease-treated cat IgG, IgA, and IgM, respectively, indicated that the 31-kD flea midgut WO 97/40058 PCT/US97/06121 156 serine protease degraded the heavy chains of the various cat immunoglobulin isotypes.

B. The ability of the cat blood-fed 31-kD flea midgut serine protease to degrade IgG from several species was demonstrated by incubating 1 Ig samples of purified cat or bovine IgG (purified from cat and bovine blood on Protein A Sepharose), or of purified dog, rabbit, or human IgG (each available from Sigma Chemical Co.) with 500 cat blood-fed flea midgut equivalents of the purified 31-kD flea midgut serine protease in a total volume of 27 il 0.1M Tris-HC1 pH 8.0 at 37 0 C for 18 hours. The reaction mixtures were resolved by 14% Tris-glycine SDS-PAGE and the gel was silver stained using standard methods. The total disappearance (compared with control samples lacking addition of the purified 31-kD protein) of bands migrating at about 50-55-kD on the silver stained gel in the lanes containing the 31-kD protease treated cat, bovine, dog, rabbit, and human IgG heavy chains, indicated that the 31kD flea midgut serine protease can degrade IgG from various mammalian species.

Example 28 This example describes the ability of a 31-kD flea midgut serine protease contained in a formulation to proteolyze cat immunoglobulin G at a specific site.

WO 97/40058 PCT/US97/06121 157 The 31-kD flea midgut serine protease was purified from cat blood-fed flea midgut extracts as described above in Examples 26 and 27.

To investigate cleavage site specificity of the purified 31-kD flea midgut serine protease, 10 gg of cat immunoglobulin G purified from cat blood on Protein A sepharose was incubated with 200 cat blood-fed flea midgut equivalents of purified 31-kD flea midgut serine protease in a total volume of 100 til 0.2 M Tris-HCl pH 8.0 at 37 0

C

for 18 hours. The reaction mixture was resolved by 14% Tris-glycine SDS-PAGE, blotted onto a PVDF membrane, stained with Coomassie R-250 and destained according to standard procedures. A band of about 33 kD was excised and subjected to N-terminal amino acid sequencing using techniques known to those skilled in the art. A partial Nterminal amino acid sequence of about 28 amino acids was determined and is represented herein as SEQ ID NO: 73: X-

A

GenBank homology search using SEQ ID NO:73 revealed most homology to Oryctolagus caniculus gamma H-chain constant region 2, there being about 71 identity over the 28 amino acids. Further alignments of SEQ ID NO:73 with sheep, rat, rabbit, monkey, bovine, and human IgG amino acid sequences indicated that the purified cat blood-fed 31-kD flea midgut serine protease cleaved the cat IgG heavy chain just before the predicted C-terminal end of the IgG hinge region. The WO 97/40058 PCT/US97/06121.

158 predicted first amino acid cysteine and the second amino acid proline occur within the predicted hinge region while the remaining 26 amino acids of SEQ ID NO:73, starting with the third amino acid proline, occur within the predicted constant heavy chain-2 region.

The further investigate the cleavage site specificity of the purified 31-kD flea midgut serine protease for cat IgG, the cleavage site was compared to that of a known protease, papain, as follows. Cat immunoglobulin G (100 mg), purified from cat blood on Protein A sepharose, was incubated with 1 mg papain in 100mM sodium acetate pH mM cysteine, 1 mM EDTA in a final volume of 150 p1 at room temperature for 4.5 hours. The reaction mixture was resolved on a 14% Tris-glycine SDS-PAGE gel, blotted onto PVDF membrane, stained with Coomassie R-250 and destained according to standard procedures. A band of about 33 kD was excised and subjected to N-terminal amino acid sequencing using techniques known to those skilled in the art. A partial N-terminal amino acid sequence of about amino acids was deduced and is represented herein as SEQ ID NO: 96: X-P-P-P-E-M-L-G-G-P-S-I-F-I-F-P-P-K-K-K-D-D-L-L-I.

This sequence was nearly identical to the one obtained from a 33-kD cat IgG cleavage product generated by the purified 31-kD flea midgut serine protease, the only difference being the substitution of a lysine (SEQ ID NO:73) for a proline (SEQ ID NO:96) at amino acid 19.

WO 97/40058 PCT/US97/06121 159 Example 29 This Example demonstrates the kinetics of cat IgG degrading activity in the midguts of fleas fed on live cats.

A. To determine the kinetics of cat IgG degradation in the guts of continuously feeding fleas, female fleas contained in chambers were fed on seven separate cats (i.e.

one chamber per cat) as described in Example 21 of copending U.S. Patent Application Serial No. 08/639,075.

Flea chambers were removed for dissections at timepoints of min., 30 min., 1 hr., 2 hr., 4 hr., 6 hr., 8 hr., and 17 hr.. After feeding on the cats, the fleas' midguts were removed as described in U.S. Patent No. 5,356,622, ibid., homogenized by freeze-fracture and sonicated in a Tris buffer comprising 50 mM Tris, pH 8.0 and 100 mM CaCl 2 The extracts were centrifuged at about 14,000 x g for 20 min.

and the soluble material recovered. The soluble material was then diluted to a final concentration of about 1.2 midgut equivalents per microliter (ul) of Tris buffer. The proteins contained in 1 midgut equivalent of each timepoint were then resolved by SDS-PAGE under reducing conditions, and the proteins visualized by silver staining. The results indicated that IgG heavy chain levels were significantly lower at the 17 hour time point than in the 8 hour and earlier time points, and that light chain levels were reduced but not to the same extent as the heavy chain.

WO 97/40058 PCT/US97/06121 160 The proteins contained in 5 gut equivalents of each timepoint were then resolved by SDS-PAGE gel under reducing conditions and were subjected to western blot analysis using alkaline phosphatase labeled goat anti-cat IgG (heavy plus light chain) antibody (available from Kirkegaard and Perry Laboratories, Gaithersburg, MD). The results indicated that cat IgG heavy chain was present in the midguts of continuously feeding fleas for at least 8 hours, but was not detected in the midguts of fleas allowed to feed continuously on a cat for 17 hours. Light chain was visible in all samples, though the amount visible in the 17 hour sample was significantly less than that visible in the 8 hour sample. These results suggest that even when fleas are continuously feeding on a cat, the levels of IgGdegrading proteases induced in the flea midguts at a time point of 17 hours is sufficient to degrade all detectable cat IgG ingested. These results suggest that when fleas are continuously feeding on a cat, the levels of IgGdegrading proteases induced in the flea midguts are not sufficient to degrade all detectable cat IgG ingested for at least 8 hours.

B. To determine the kinetics of cat IgG degradation in the guts of fleas fed for a specified time then removed from the cat, fleas (in chambers) were fed on cats as in Section A for periods of either 1 hour or 24 hours.

Following the 1 or 24-hour feeding periods, the flea WO 97/40058 PCT/US97/06121.

161 chambers were removed and placed in a 28 0 C, 75% relative humidity growth incubator. Fleas were subjected to dissection at time points of 0, 1, 2, 4, and 8 hr.

following removal from the cats. Midguts were homogenized, and the midgut contents were examined by silver stained SDS-PAGE and immunoblot analysis, as described in Section A. The fleas fed for 1 hour had high molecular weight proteins, including the heavy chain and light chain of cat IgG detectable in their midguts at the 0 and 1 hour dissection timepoints, while the flea midguts evaluated at time points of 2 hours or greater had no detectable IgG heavy chain bands. The results showed that when fleas were fed on a cat and then removed, they degraded the ingested cat IgG heavy chain nearly completely within 2 hours. The fleas fed on cats for 24 hours had no detectable IgG heavy or light chain proteins in midgut extracts at any of the timepoints. These results suggest that when no new cat IgG is ingested, as is the case when the fleas are removed from feeding, that the IgG-degrading proteases in the flea midgut fully degraded all cat IgG heavy chain in less than two hours.

Example This example describes the ability of a 31-kD flea midgut serine protease contained in a formulation to proteolyze cat immunoglobulin G at a specific site.

WO 97/40058 PCT/US97/06121 162 The 31-kD flea midgut serine protease was purified from cat blood-fed flea midgut extracts as described above in Examples 14 and To further investigate cleavage site specificity of the purified 31-kD flea midgut serine protease, microgram of cat immunoglobulin G purified from cat blood on Protein A sepharose was incubated with 200 cat blood-fed flea midgut equivalents of purified 31-kD flea midgut serine protease in a total volume of 300 microliter of mM Tris-HCl pH 8.0 at 37 0 C. About 10 microliter aliquots of the incubation mixture were removed at 1, 2, 4, 6, 8, 12, 16 and 24 hours after initiation of the incubation.

Following removal, each aliquot was mixed with about 1 microliter of 20 mM p-aminobenzamidine and stored at -80 0

C.

Each aliquot from the incubation mixture was resolved by 14% Tris-glycine SDS-PAGE, blotted onto a PVDF membrane, stained with Coomassie R-250 and destained according to standard procedures. A band of about 34 kD was identified in the aliquot removed after 1 hour of incubation. The band was excised and subjected to N-terminal amino acid sequencing using techniques known to those skilled in the art. A partial N-terminal amino acid sequence of about amino acids was determined and is represented herein as SEQ ID NO:104: D-C-P-K-C-P-P-P-E-M-L-G-G-P-S-I-F-I-F-P-P-K-P-K- D. An additional 10 amino acids were also obtained beyond the last amino acid of SEQ ID NO:104. A combined amino WO 97/40058 PCT/US97/06121.

163 acid sequence of about 35 amino acids was determined and is represented herein as SEQ ID NO:105: D-C-P-K-C-P-P-P-E-M-L- G-G-P-S-I-F-I-F-P-P-K-P-K-D-D-L-L-I-K-R-K-S-E-V. A GenBank homology search using SEQ ID NO:105 revealed most homology to Homo sapien immunoglobulin gamma 3 heavy chain constant region 2 exon hinge IGHG3 gene (GenBank Accession No.

X99549), there being about 69% identity over the 35 amino acids. Further alignments of SEQ ID NO:105 with cat, rabbit, bovine and human IgG amino acid sequences indicated that purified cat blood-fed 31-kD flea midgut serine protease cleaved the cat IgG heavy chain about 6 amino acids just before the predicted C-terminal end of the IgG hinge region. The first 6 amino acids, aspartic acid, cysteine, proline, lysine, cysteine and proline, occur within the predicted hinge region while the remaining 29 amino acids of SEQ ID NO:105, starting with the seventh amino acid proline, occur within the predicted constant heavy chain-2 region.

A comparison of SEQ ID NO:105 and an amino acid sequence for portions of the hinge and CH2 regions of cat IgG (represented herein by SEQ ID NO:106)is shown below: SEQ ID NO:105 D-C-P-K-C-P-P-P-E-M-L-G-G-P-S-I-F-I-F-P-P-K-P-K- SEQ ID NO:106 D-C-P-K-C-P-P-P-E-M-L-G-G-P-S-I-F-I-F-P-PK-P-K- SEQ ID NO:105 (continued) -D-D-L-L-I-K-R-K-S-E-V SEQ ID NO:106 (continued) -D-T-L-S-I-S-R-T-P-E-V Discrepancies between SEQ ID NO:105 and SEQ ID NO:106 are shown in bold. Applicants believe that the difference between SEQ ID NO:105 and SEQ ID NO:106 may be due to WO 97/40058 PCTIUS97/06121 164 sequencing error of the last 10 amino acids of SEQ ID NO: 105.

WO 97/40058 PCT/US97/06121 165 SEQUENCE LISTING The following Sequence Listing is submitted pursuant to 37 CFR §1.821. A copy in computer readable form is also submitted herewith.

Applicants assert pursuant to 37 CFR §1.821(f) that the content of the paper and computer readable copies of SEQ ID NO:1 through SEQ ID NO:165 submitted herewith are the same.

GENERAL INFORMATION: APPLICANT: Grieve, Robert B.

Rushlow, Keith E.

Wu Hunter, Shirley Frank, Glenn R.

Stiegler, Gary Gaines, Patrick J.

Silver, Gary (ii) TITLE OF INVENTION: FLEA PROTEASE PROTEINS, NUCLEIC ACID MOLECULES AND USES THEREOF (iii) NUMBER OF SEQUENCES: 165 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Sheridan Ross P.C.

STREET: 1700 Lincoln Street, Suite 3500 CITY: Denver STATE: Colorado COUNTRY: USA ZIP: 80203 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE:

CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION: NAME: Connell, Gary J.

REGISTRATION NUMBER: 32,020 REFERENCE/DOCKET NUMBER: 2618-25-C4-PCT (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (303) 863-9700 TELEFAX: (303) 863-0223 WO 97/40058 PCTIUS97/06121 166 2) INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 1109 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 126..1106 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: TACAATTAGG GTGACGATCT TTTAGACAAG CGAAATTGAT AACGAAGTTT TCGAAGTCGG ATGAAGTAAA AACCTTGCGT TGGTTTCCCC GGTCCCAGGA TCAGGAACAG TTGCACTTTA 120 CCCCA ATG AGG GAA TTC GTG CAT CCC CAT TTT ACC GAA CAT ATT GAT 167 Met Arg Glu Phe Val His Pro His Phe Thr Glu His Ile Asp 1 5 GAA GAA TTC CAC CGA TTC ATC AAT AAA CAC GGA AAA ATT TAT AAT AAA 215 Glu Glu Phe His Arg Phe Ile Asn Lys His Gly Lys Ile Tyr Asn Lys 20 25 AAT GAA GAA CAT CAT TTC CGC AAA GAA ATT TTC AGA CTA AAC TTG AGG 263 Asn Glu Glu His His Phe Arg Lys Glu Ile Phe Arg Leu Asn Leu Arg 40 TAC ATT TTT TCT AAG AAT CGT GCA AAT TTG GGA TAC ACT TTG ACT GTT 311 Tyr Ile Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr Thr Leu Thr Val 50 55 AAC CAT TTG GCT GAT CGT ACT GAA GCT GAA CTT AAG GCT TTG AGA GGA 359 Asn His Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys Ala Leu Arg Gly 70 CAC AGA CCT TCC TCC GGT TAT AAT GGC GGT TTA CCC TTT CCT CAC AAT 407 His Arg Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro Phe Pro His Asn 85 ACC ACC AAG GAA GCA AGA AAT TTA CCA GAT TCT TTC GAC TGG CGA ATT 455 Thr Thr Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe Asp Trp Arg Ile 100 105 110 TAT GGA GCT GTT ACT CCA GTT AAA GAT CAA TCT GTT TGT GGT TCC TGC 503 Tyr Gly Ala Val Thr Pro Val Lys Asp Gln Ser Val Cys Gly Ser Cys 115 120 125 TGG TCT TTC GGA ACA ATT GGA GCA ATC GAA GGT GCA TAT TTC TTG AAA 551 Trp Ser Phe Gly Thr Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys 130 135 140 AAC GGC GGT AAT CTT GTA CGA TTG TCT CAA CAG GCT TTG ATT GAT TGT 599 Asn Gly Gly Asn Leu Val Arg Leu Ser Gln Gln Ala Leu Ile Asp Cys 145 150 155 WO 97/40058 PCT/US97/06121.

167 TCT TGG GGA TAT GGA AAT AAT GGT TGC GAC GGT GGC GAA GAT TTC CGC 647 Ser Trp Gly Tyr Gly Asn Asn Gly Cys Asp Gly Gly Glu Asp Phe Arg 160 165 170 GCC TAC CAA TGG ATG ATG AAA CAT GGA GGA ATC CCT ACT GAA GAG GAT 695 Ala Tyr Gin Trp Met Met Lys His Gly Gly Ile Pro Thr Glu Glu Asp 175 180 185 190 TAT GGT GGT TAC TTG GGA CAA GAT GGT TAC TGC CAT GTC AAC AAC GTT 743 Tyr Gly Gly Tyr Leu Gly Gin Asp Gly Tyr Cys His Val Asn Asn Val 195 200 205 ACT TTA GTT GCT CCC ATC ACA GGA TAT GTC AAC GTA ACT CGT AAC GAT 791 Thr Leu Val Ala Pro Ile Thr Gly Tyr Val Asn Val Thr Arg Asn Asp 210 215 220 GTT GAC GCT ATG AAG GTT GCC CTT CTT AAA CAC GGT CCG ATT TCG GTG 839 Val Asp Ala Met Lys Val Ala Leu Leu Lys His Gly Pro Ile Ser Val 225 230 235 GCC ATT GAC GCA TCA CAC AAA ACA TCC AGT TTT TAC TCC AAC GGC GTT 887 Ala Ile Asp Ala Ser His Lys Thr Ser Ser Phe Tyr Ser Asn Gly Val 240 245 250 TAC TAC CAA CCG AAA TGT GGC AAT AAA AGA GGA CAA TTA GAC CAC GCC 935 Tyr Tyr Gin Pro Lys Cys Gly Asn Lys Arg Gly Gin Leu Asp His Ala 255 260 265 270 GTA TTA GTA GTC GGT TAT GGT GAA ATC AAC AGC GAA CCT TAC TGG TTG 983 Val Leu Val Val Gly Tyr Gly Glu Ile Asn Ser Glu Pro Tyr Trp Leu 275 280 285 GTC AAG AAT CCT GGT CAA TTG TGG GGA AAC AAT GTT ATA TTT GAT GTC 1031 Val Lys Asn Pro Gly Gin Leu Trp Gly Asn Asn Val Ile Phe Asp Val 290 295 300 GCC AAA AAT AAT AAT GCG GAT TTG ACG ATC ACT TAT TTA CTA TGT ACT 1079 Ala Lys Asn Asn Asn Ala Asp Leu Thr Ile Thr Tyr Leu Leu Cys Thr 305 310 315 ACT TTT AAA ATT GAT TTT TAC GGC ATG TAA 1109 Thr Phe Lys Ile Asp Phe Tyr Gly Met 320 325 INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 327 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Arg Glu Phe Val His Pro His Phe Thr Glu His Ile Asp Glu Glu 1 5 10 Phe His Arg Phe Ile Asn Lys His Gly Lys Ile Tyr Asn Lys Asn Glu 25 WO 97/40058 PCT/US97/06121 168 Glu His His Phe Arg Lys Glu Ile Phe Arg Leu Asn Leu Arg Tyr Ile 40 Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr Thr Leu Thr Val Asn His 55 Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys Ala Leu Arg Gly His Arg 70 75 Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro Phe Pro His Asn Thr Thr 90 Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly 100 105 110 Ala Val Thr Pro Val Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser 115 120 125 Phe Gly Thr Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly 130 135 140 Gly Asn Leu Val Arg Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp 145 150 155 160 Gly Tyr Gly Asn Asn Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr 165 170 175 Gin Trp Met Met Lys His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly 180 185 190 Gly Tyr Leu Gly Gin Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu 195 200 205 Val Ala Pro Ile Thr Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp 210 215 220 Ala Met Lys Val Ala Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile 225 230 235 240 Asp Ala Ser His Lys Thr Ser Ser Phe Tyr Ser Asn Gly Val Tyr Tyr 245 250 255 Gin Pro Lys Cys Gly Asn Lys Arg Gly Gin Leu Asp His Ala Val Leu 260 265 270 Val Val Gly Tyr Gly Glu Ile Asn Ser Glu Pro Tyr Trp Leu Val Lys 275 280 285 Asn Pro Gly Gin Leu Trp Gly Asn Asn Val Ile Phe Asp Val Ala Lys 290 295 300 Asn Asn Asn Ala Asp Leu Thr Ile Thr Tyr Leu Leu Cys Thr Thr Phe 305 310 315 320 Lys Ile Asp Phe Tyr Gly Met 325 WO 97/40058 PCT/US97/06121 169 INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 1109 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: TTACATGCCG TAAAAATCAA TTTTAAAAGT AGTACATAGT AAATAAGTGA TCGTCAAATC

CGCATTATTA

CTTGACCAAC

GTCTAATTGT

ACTGGATGTT

AACCTTCATA

TAAAGTAACG

TTCAGTAGGG

GTCGCAACCA

TACAAGATTA

AGACCAGCAG

CCAGTCGAAA

ACCGCCATTA

ACGATCAGCC

AATGTACCTC

ATAAATTTTT

ATGGGGATGC

GGGAAACCAA

TTGTCTAAAA

TTTTTGGCGA

CAGTAAGGTT

CCTCTTTTAT

TTGTGTGATG

GCGTCAACAT

TTGTTGACAT

ATTCCTCCAT

TTATTTCCAT

CCGCCGTTTT

GAACCACAAA

GAATCTGGTA

TAACCGGAGG

AAATGGTTAA

AAGTTTAGTC

CCGTGTTTAT

ACGAATTCCC

CGCAAGGTTT

GATCGTCACC

CAT CAAATAT

CGCTGTTGAT

TGCCACATTT

CGTCAATGGC

CGTTACGAGT

GGCAGTAACC

GTTTCATCAT

ATCCCCAAGA

TCAAGAAATA

CAGATTGATC

AATTTCTTGC

AAGGTCTGTG

CAGTCAAAGT

TGAAAATTTC

TGATGAATCG

TCATTGGGGT

TTACTTCATC

CTAATTGTA

AACATTGTTT

TTCACCATAA

CGGTTGGTAG

CACCGAAATC

TACGTTGACA

ATCTTGT CCC CCATT GGTAG

ACAATCAATC

TGCACCTTCG

TTTAACTGGA

TTCCTTGGTG

TCCTCTCAAA

GTATCCCAAA

TTTGCGGAAA

GTGGAATTCT

AAAGTGCAAC

CCCCACAATT

CCGACTACTA

TAAACGCCGT

GGACCGTGTT

TATCCTGTGA

AAGTAACCAC

GCGCGGAAAT

AAAGCCTGTT

ATTGCTCCAA

GTAACAGCTC

GTATTGT GAG

GCCTTAAGTT

TTTGCACGAT

TGATGTTCTT

TCATCAATAT

TGTTCCTGAT

GACCAGGATT

ATACGGCGTG

TGGAGTAAAA

TAAGAAGGGC

TGGGAGCAAC

CATAAT CCT C

CTTCGCCACC

GAGACAATCG

TTGTTCCGAA

CATAAATTCG

GAAAGGGTAA

CAGCTTCAGT

TCTTAGAAAA

CATTTTTATT

GTTCGGTAAA.

CCTGGGACCG

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1109 CGACTTCGAA AACTTCGTTA TCAATTTCGC INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 984 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA WO 97/40058 PCTfUS97/06121 170 (ix) FEATURE: NAME/KEY-: CDS LOCATION: 1..981 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: ATG AGG GAA TTC GTG CAT CCC CAT TTT ACC GAA CAT ATT GAT GAA GAA 48 Met Arg Glu Phe Val His Pro His Phe Thr Glu His Ile Asp Glu Glu 1 5 10 TTC CAC CGA TTC ATC AAT AAA CAC GGA AAA ATT TAT AAT AAA AAT GAA 96 Phe His Arg Phe Ile Asn Lys His Gly Lys Ile Tyr Asn Lys Asn Glu 20 25 GAA CAT CAT TTC CGC AAA GAA ATT TTC AGA CTA AAC TTG AGG TAC ATT 144 Glu His His Phe Arg Lys Glu Ile Phe Arg Leu Asn Leu Arg Tyr Ile 40 TTT TCT AAG AAT CGT GCA AAT TTG GGA TAC ACT TTG ACT GTT AAC CAT 192 Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr Thr Leu Thr Val Asn His 55 TTG GCT GAT CGT ACT GAA GCT GAA CTT AAG GCT TTG AGA GGA CAC AGA 240 Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys Ala Leu Arg Gly His Arg 70 75 CCT TCC TCC GGT TAT AAT GGC GGT TTA CCC TTT CCT CAC AAT ACC ACC 288 Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro Phe Pro His Asn Thr Thr 90 AAG GAA GCA AGA AAT TTA CCA GAT TCT TTC GAC TGG CGA ATT TAT GGA 336 Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly 100 105 110 GCT GTT ACT CCA GTT AAA GAT CAA TCT GTT TGT GGT TCC TGC TGG TCT 384 Ala Val Thr Pro Val Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser 115 120 125 TTC GGA ACA ATT GGA GCA ATC GAA GGT GCA TAT TTC TTG AAA AAC GGC 432 Phe Gly Thr Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly 130 135 140 GGT AAT CTT GTA CGA TTG TCT CAA CAG GCT TTG ATT GAT TGT TCT TGG 480 Gly Asn Leu Val Arg Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp 145 150 155 160 GGA TAT GGA AAT AAT GGT TGC GAC GGT GGC GAA GAT TTC CGC GCC TAC 528 Gly Tyr Gly Asn Asn Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr 165 170 175 CAA TGG ATG ATG AAA CAT GGA GGA ATC CCT ACT GAA GAG GAT TAT GGT 576 Gin Trp Met Met Lys His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly 180 185 190 GGT TAC TTG GGA CAA GAT GGT TAC TGC CAT GTC AAC AAC GTT ACT TTA 624 Gly Tyr Leu Gly Gin Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu 195 200 205 GTT GCT CCC ATC ACA GGA TAT GTC AAC GTA ACT CGT AAC GAT GTT GAC 672 Val Ala Pro Ile Thr Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp 210 215 220 WO 97/40058 PCT/US97/06121.

GCT

Ala 225

GAC

Asp

CAA

Gin

GTA

Val

AAT

Asn

AAT

Asn 305

AAA

Lys 171 ATG AAG GTT GCC CTT CTT AAA CAC GGT CCG ATT TCG GTG GCC ATT Met Lys Val Ala Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile 230 235 240 GCA TCA CAC AAA ACA TCC AGT TTT TAC TCC AAC GGC GTT TAC TAC Ala Ser His Lys Thr Ser Ser Phe Tyr Ser Asn Gly Val Tyr Tyr 245 250 255 CCG AAA TGT GGC AAT AAA AGA GGA CAA TTA GAC CAC GCC GTA TTA Pro Lys Cys Gly Asn Lys Arg Gly Gin Leu Asp His Ala Val Leu 260 265 270 GTC GGT TAT GGT GAA ATC AAC AGC GAA CCT TAC TGG TTG GTC AAG Val Gly Tyr Gly Glu Ile Asn Ser Glu Pro Tyr Trp Leu Val Lys 275 280 285 CCT GGT CAA TTG TGG GGA AAC AAT GTT ATA TTT GAT GTC GCC AAA Pro Gly Gin Leu Trp Gly Asn Asn Val Ile Phe Asp Val Ala Lys 290 295 300 AAT AAT GCG GAT TTG ACG ATC ACT TAT TTA CTA TGT ACT ACT TTT Asn Asn Ala Asp Leu Thr Ile Thr Tyr Leu Leu Cys Thr Thr Phe 310 315 320 ATT GAT TTT TAC GGC ATG TAA Ile Asp Phe Tyr Gly Met 325 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 327 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Arg Glu Phe Val His Pro His Phe Thr 1 5 10 Phe His Arg Phe Ile Asn Lys His Gly Lys Glu His His Phe Arg Lys Glu Ile Phe Arg Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr 55 Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys 70 Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro 85 90 Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe 100 105 Ala Val Thr Pro Val Lys Asp Gin Ser Val 115 120 Glu His Ile Tyr Leu Asn Thr Leu Ala Leu 75 Phe Pro Asp Trp Cys Gly Glu Asn Tyr Asn His Thr Tyr Trp Glu Glu Ile His Arg Thr Gly Ser WO 97/40058 PCT/US97/06121.

172 Phe Gly 130 Gly Asn 145 Thr Ile Gly Leu Val Arg Ala Ile -135 Leu Ser Glu Gly Ala Tyr Leu Lys Asn Gly Gin Gin Ala Leu 155 Asp Cys Ser Gly Tyr Gly Asn Asn 165 Cys Asp Gly Gin Trp Met Gly Tyr Leu 195 Val Ala Pro Met 180 Lys His Gly Gly Ile 185 Cys Gly Glu Asp 170 Pro Thr Glu His Val Asn Phe Arg Ala 175 Glu Gly Gin Asp Gly Tyr 200 Val Asp Tyr Gly 190 Val Thr Leu Asp Val Asp Ile Thr Gly 210 Met Tyr 215 Leu Asn Val Thr Arg 220 Ile Ala 225 Asp Lys Val Ala Leu 230 Lys His Gly Ser Val Ala lie 240 Ala Ser His Lys 245 Gly Thr Ser Ser Phe Tyr 250 Gln Asn Gly Val Tyr Tyr 255 Gln Pro Lys Asn Lys Arg Gly 265 Ser Leu Asp His Val Val Gly 275 Asn Pro Gly Gly Glu Ile Asn 280 Asn Glu Pro Tyr Ala Val Leu 270 Leu Val Lys Val Ala Lys Gin Leu Trp 290 Gly 295 Asn Val Ile Phe 300 Leu Asn 305 Lys Asn Asn Ala Asp Leu Thr 310 Gly Met Ile Thr Tyr Cys Thr Thr Ile Asp Phe Tyr 325 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 981 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: CATGCCGTAA AAATCAATTT TAAAAGTAGT ACATAGTAAA TAAGTGATCG ATTATTATTT TTGGCGACAT CAAATATAAC ATTGTTTCCC CACAATTGAC GACCAACCAG TAAGGTTCGC TGTTGATTTC ACCATAACCG ACTACTAATA TAATTGTCCT CTTTTATTGC CACATTTCGG TTGGTAGTAA

ACGCCGTTGG

GGATGTTTTG TGTGATGCGT CAATGGCCAC CGAAATCGGA CCGTGTTTAA CTTCATAGCG TCAACATCGT TACGAGTTAC GTTGACATAT

CCTGTGATGG

TCAAATCCGC

CAGGATTCTT

CGGCGTGGTC

AGTAAAAACT

GAAGGGCAAC

GAGCAACTAA

120 180 240 300 360 WO 97/40058 WO 9740058PCTIUS97/06121 173

AGTAACGTTG

AGTAGGGATT

GCAACCATTA

AAGATTACCG

CCAGCAGGAA

GTCGAAAGAA

GCCATTATAA

ATCAGCCAAA

GTACCTCAAG

AATTTTTCCG

GGGATGCACG

TTGACATGGC

CCTC CAT GTT

TTTCCATATC

CCGTTTTTCA

CCACAAACAG

TCTGGTAAAT

CCGGAGGAAG

TGGTTAACAG

TTTAGTCTGA

TGTTTATTGA

AATTCCCTCA

AGTAACCATC

TCAT CAT CCA

CCCAAGAACA

AGAAATATGC

ATTGATCTTT

TTCTTGCTTC

GTCTGTGTCC

TCAAAGTGTA

AAATTTCTTT

TGAATCGGTG

TTGTCCCAAG

TTGGTAGGCG

ATCAATCAAA

ACCTTCGATT

AACTGGAGTA

CTTGGTGGTA

TCTCAAAGCC

TCCCAAATTT

GCGGAAATGA

GAATTCTTCA

TAACCACCAT

CGGAAATCTT

GOCT GTT GAG

GCTCCAATTG

ACAGCTCCAT

TTGTGAGGAA

TTAAGTTCAG

GCACGATTCT

TGTTCTTCAT

TCAATATGTT

AATCCTCTTC

CGCCACCGTC

ACAATCGTAC

TTCCGAAAGA

AAATTCGCCA

AGGGTAAACC

CTTCAGTACG

TAGAAAAAAT

TTTTATTATA

CGGTAAAATG

420 480 540 600 660 720 780 840 900 960 981 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 681 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY:- linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: l. .678 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: TTA CCA GAkT TCT TTC GAC TGG CGA ATT TAT GGA GCT GTT ACT CCA GTT Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr Pro Val 1 5 10 AAA GAT CAA TCT GTT TGT GGT TCC TGC TGG TCT TTC GGA ACA ATT GGA Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly 25 GCA ATC GAA GGT GCA TAT TTC TTG AAA AAC GGC GGT AAT CTT GTA CGA Ala Ile Giu Gly Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu Val Arg 40 TTG TCT CAA CAG GCT TTG ATT GAT TGT TCT TGG GGA TAT GGA AAT AAT Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly Asn Asn 55 GGT TGC GAC GGT GGC GAA GAT TTC CGC GCC TAC CAA TGG ATG ATG AAA Giy Cys Asp Gly Gly Glu Asp Phe Arq Ala Tyr Gin Trp Met Met Lys 70 75 CAT GGA GGA ATC CCT ACT GAA GAG GAT TAT GGT GGT TAC TTG GGA CAA His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu Gly Gin 85 90 48 96 144 192 240 288 WO 97/40058 PCT/US97/06121.

174 GAT GGT TAC TGC CAT GTC AAC AAC GTT ACT TTA GTT GCT CCC ATC ACA 336 Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro Ile Thr 100 105 110 GGA TAT GTC AAC GTA ACT CGT AAC GAT GTT GAC GCT ATG AAG GTT GCC 384 Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys Val Ala 115 120 125 CTT CTT AAA CAC GGT CCG ATT TCG GTG GCC ATT GAC GCA TCA CAC AAA 432 Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser His Lys 130 135 140 ACA TCC AGT TTT TAC TCC AAC GGC GTT TAC TAC CAA CCG AAA TGT GGC 480 Thr Ser Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Gin Pro Lys Cys Gly 145 150 155 160 AAT AAA AGA GGA CAA TTA GAC CAC GCC GTA TTA GTA GTC GGT TAT GGT 528 Asn Lys Arg Gly Gin Leu Asp His Ala Val Leu Val Val Gly Tyr Gly 165 170 175 GAA ATC AAC AGC GAA CCT TAC TGG TTG GTC AAG AAT CCT GGT CAA TTG 576 Glu Ile Asn Ser Glu Pro Tyr Trp Leu Val Lys Asn Pro Gly Gin Leu 180 185 190 TGG GGA AAC AAT GTT ATA TTT GAT GTC GCC AAA AAT AAT AAT GCG GAT 624 Trp Gly Asn Asn Val Ile Phe Asp Val Ala Lys Asn Asn Asn Ala Asp 195 200 205 TTG ACG ATC ACT TAT TTA CTA TGT ACT ACT TTT AAA ATT GAT TTT TAC 672 Leu Thr Ile Thr Tyr Leu Leu Cys Thr Thr Phe Lys Ile Asp Phe Tyr 210 215 220 GGC ATG TAA 681 Gly Met 225 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 226 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr Pro Val 1 5 10 Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly 25 Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu Val Arg 35 40 Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly Asn Asn 55 Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr Gin Trp Met Met Lys 70 75 vv WO 97/40058 PCT/US97/06121 Hi As Gi Le Th 14 As Gl Tr Le Gl 22 (2

T(

SE

TJ

T

cc

A

G

s p y u .r 5 n u :5 u y Gly Gly Tyr Leu 130 Ser Lys Ile Gly Thr 210 Met Gly Tyr Val 115 Lys Ser Arg Asn Asn 195 Ile Ile Cys 100 Asn His Phe Gly Ser 180 Asn Thr Pro His Val Gly Tyr Gin 165 Glu Val Tyr Thr Val Thr Pro Ser 150 Leu Pro Ile Leu Glu Asn Arg Ile 135 Asn Asp Tyr Phe Leu 215 Glu Asn Asn 120 Ser Gly His Trp Asp 200 Cys 175 Asp Tyr 90 Val Thr 105 Asp Val Val Ala Val Tyr Ala Val 170 Leu Val 185 Val Ala Thr Thr Gly Leu Asp Ile Tyr 155 Leu Lys Lys Phe Gly Val Ala Asp 140 Gin Val Asn Asn Lys 220 Tyr Ala Met 125 Ala Pro Val Pro Asn 205 Ile Leu Pro 110 Lys Ser Lys Gly Gly 190 Asn Asp Gly Ile Val His Cys Tyr 175 Gin Ala Phe Gin Thr Ala Lys Gly 160 Gly Leu Asp Tyr INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 775 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..685 OTHER INFORMATION: /note= "At pos. bp 559, change A to N. At pos. aa 187, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: CA AAT CGG ATT GTT AAT GGA GTT AAT GCC AAA AAC GGT TCT GCT CCA er Asn Arg Ile Val Asn Gly Val Asn Ala Lys Asn Gly Ser Ala Pro 1 5 10 AT ATG GCT TCT CTA AGA GAT GTT ATG GAA ACC ATT TCT GTG GAG CAT yr Met Ala Ser Leu Arg Asp Val Met Glu Thr Ile Ser Val Glu His 25 GA TAT TGG ATG AAC CGC TGG ATT CTT ACT GCT GCC CAT TGC CTT ACT rg Tyr Trp Met Asn Arg Trp Ile Leu Thr Ala Ala His Cys Leu Thr 40 AC GGT TAT CTA GAT ACA GTC TAC GTT GGT TCA AAT CAT CTT TCT GGC WO 97/40058 PCT/US97/06121 176 Asp Gly Tyr Leu Asp Thr Val Tyr Val Gly Ser Asn His Leu Ser Gly 55 GAC GGA GAG TAC TAC AAT GTA GAA GAA CAA GTC ATC CAT GAT AAA TAT 240 Asp Gly Glu Tyr Tyr Asn Val Glu Glu Gin Val Ile His Asp Lys Tyr 65 70 75 TTT GGT CAA ACA ACC GGC TTC AAA AAT GAT ATT GCT CTC GTC AAA GTT 288 Phe Gly Gin Thr Thr Gly Phe Lys Asn Asp Ile Ala Leu Val Lys Val 90 TCT AGT GCT ATA AAA CTT AGC AAA AAT GTT CGT CCC ATC AAA TTG CAC 336 Ser Ser Ala Ile Lys Leu Ser Lys Asn Val Arg Pro Ile Lys Leu His 100 105 110 AAA GAT TTT ATA CGC GGA GGT GAA AAA TTG AAA ATT ACT GGA TGG GGA 384 Lys Asp Phe Ile Arg Gly Gly Glu Lys Leu Lys Ile Thr Gly Trp Gly 115 120 125 TTG ACC AAT CAA ACT CAT GGT GAA GTT CCT GAT GCT CTT CAA GAG TTA 432 Leu Thr Asn Gin Thr His Gly Glu Val Pro Asp Ala Leu Gin Glu Leu 130 135 140 CAG GTA GAA GCA CTT TCT AAC TCT AAA TGC AAG GCA ATT ACT GGT GTC 480 Gin Val Glu Ala Leu Ser Asn Ser Lys Cys Lys Ala Ile Thr Gly Val 145 150 155 160 CAT CTT CCT GCT CAT CTC TGC ACC TTC AAA GCA CCT CAA AAG GGT GTA 528 His Leu Pro Ala His Leu Cys Thr Phe Lys Ala Pro Gin Lys Gly Val 165 170 175 TGC ATG GGT GAC TCT GGT GGT CCT CTG GTC NAT AAG GGC AAG CAA GTT 576 Cys Met Gly Asp Ser Gly Gly Pro Leu Val Xaa Lys Gly Lys Gin Val 180 185 190 GGA GTC ACA TCT TTC GTC TGG GAA GGT TGT GCT TTG GGC AAC CCT GAT 624 Gly Val Thr Ser Phe Val Trp Glu Gly Cys Ala Leu Gly Asn Pro Asp 195 200 205 TTC TTT ACA AGA GTT TCG CTT TAT GTA GAC TGG GTC AAA AAG ATT CAA 672 Phe Phe Thr Arg Val Ser Leu Tyr Val Asp Trp Val Lys Lys Ile Gin 210 215 220 AAA GAA TAT AAA T GATATGTTGA TTGTCACTAA AATGCATCGA TTTGGATAAT 725 Lys Glu Tyr Lys 225 TTGGTTGTGA ATATAATTTT ATTTCTAGCA TCAAAAAAAA AAAAAAAAAA 775 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 228 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 1~77 Ser Asn Arg Ile Val Asn Gly Val Asn Ala Tyr Arg Asp Asp 65 Phe Ser Lys Leu Gin 145 His Cys Gly Phe Lys 225 Met Tyr Gi y Gi y Gi y Ser Asp Thr 130 Val Leu Met Val Phe 210 Giu Al a T rp Tyr Gi u Gin Al a Phe 115 Asn Glu Pro Gi y Thr 195 Thr Tyr Se r Met Leu Tyr Thr Ile 100 Ile Gin Al a Al a Asp 180 Ser Arg Lys Leu Asn Asp Tyr Th r Lys Arg Thr Leu His 165 Ser Phe Val Arg Arg Thr Asn 70 Gi y Leu Gi y His Ser 150 Leu Gi y Val Asp T rp Val 55 Val Phe Ser Gi y Gi y 135 Asn Cys Gi y Trp, Val1 Ile 40 Tyr Giu Lys Lys Glu 120 Gi u Ser Thr Pro Gi u 200 Met 25 Leu Val Giu Asn As n 105 Lys Val Lys Phe Leu 185 Gi y Lys Thr Al a Ser Vai 75 Ile Arg Lys Asp Lys 155 Al a Xaa Al a Asn Ile Al a Asn Ile Ala Pro Ile Ala 140 Ala Pro Lys Leu Gi y Ser His His His Leu Ile Thr 125 Leu Ile Gin Gly Gi y 205 Ser Val Cys Leu Asp Val Lys 110 Gi y Gin Thr Lys Lys 190 Asn Al a Giu Leu Ser Lys Lys Leu T rp Gi u Gly Gi y 17 Gin Pro Pro His Thr Gi y Tyr Val His Gi y Le u Val 160 Val Val Asp Ser Leu Tyr Val Asp Trp Vai Lys Lys Ile Gin INFORMATION FOR SEQ ID NO:1i: SEQUENCE CHARACTERISTICS: LENGTH: 775 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: iinear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:ii: TTTTTTTTTT TTTTTTTTGA TGCTAGAAAT AAAATTATAT TCACAACCAA ATTATCCAAA TCGATGCATT TTAGTGACAA TCAACATATC ATTTATATTC TTTTTGAATC TTTTTGACCC AGTCTACATA AAGCGAAACT CTTGTAAAGA AATCAGGGTT GCCCAAAGCA CAACCTTCCC WO 97/40058 WO 9740058PCTIUS97/06121 178

AGACGAAAGA

CCATGCATAC

CAGTAATTGC

CAGGAACTTC

CTCCGCGTAT

CACTAGAAAC

TAT CAT GGAT

AACCAACGTA

AGCGGTTCAT

CCATATATGG

TGTGACTCCA

ACCCTTTTGA

CTTGCATTTA

ACCATGAGTT

AAAATCTTTG

TTTGACGAGA

GACTTGTTCT

GACTGTATCT

CCAATATCGA

AGCAGAACCG

ACTTGCTTGC

GGTGCTTTGA

GAGTTAGAAA

TGATTGGTCA

TGCAATTTGA

GCAATATCAT

TCTACATTGT

AGATAACCGT

TGCTCCACAG

TTTTTGGCAT

CCTTATNGAC

AGGTGCAGAG

GTGCTTCTAC

ATC C CCATC C

TGGGACGAAC

TTTTGAAGCC

AGTACTCTCC

CAGTAAGGCA

AAATGGTTTC

TAACTCCATT

CAGAGGACCA

ATGAGCAGGA

CTGTAACTCT

AGTAATTTTC

ATTTTTGCTA

GGTTGTTTGA

GTCGCCAGAA

ATGGGCAGCA

CATAACATCT

AACAAT C CGA

CCAGAGTCAC

AGATGGACAC

T GAAGAGCAT

AATTTTTCAC

AGTTTTATAG

CCAAAATATT

AGATGATTTG

GTAAGAATCC

CTTAGAGAAG

TTTGA

240 300 360 420 480 540 600 660 720 775 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 675 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 1. .675 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: ATT GTT PAT GGA GTT PAT GCC AAA AAC GGT TCT GCT CCA TAT ATG GCT Ile Val Asn Gly Val Asn Ala Lys Asn Gly Ser Ala Pro Tyr Met Ala 1 5 10 TCT CTA AGA GAT GTT ATG GAA ACC ATT TCT GTG GAG CAT CGA TAT TGG Ser Leu Arg Asp Val Met Giu Thr Ile Ser Val Giu His Arg Tyr Trp 25 ATG AAC CGC TGG ATT CTT ACT GCT GCC CAT TGC CTT ACT GAC GGT TAT Met Asn Arg Trp Ile Leu Thr Ala Ala His Cys Leu Thr Asp Gly Tyr 35 40 CTA GAT ACA GTC TAC GTT GGT TCA PAT CAT CTT TCT GGC GAC GGA GAG Leu Asp Thr Vai Tyr Val Gly Ser Asn His Leu Ser Gly Asp Giy Glu 55 TAC TAC PAT GTA GAA GAA CA GTC ATC CAT GAT AAA TAT TTT GGT CA Tyr Tyr Asn Val Giu Glu Gin Val Ile His Asp Lys Tyr Phe Giy Gin 70 75 ACA ACC GGC TTC AAA PAT GAT ATT GCT CTC GTC AAA GTT TCT AGT GCT Thr Thr Gly Phe Lys Asn Asp Ile Ala Leu Val Lys Vai Ser Ser Ala 90 ATA PAA CTT AGC AAA PAT GTT CGT CCC ATC AAA TTG CAC AAA GAT TTT Ile Lys Leu Ser Lys Asn Val Arg Pro Ile Lys Leu His Lys Asp Phe 144 192 240 288 336 WO 97/40058 PCT/US97/06121.

179

ATA

Ile

CAA

Gin

GCA

Ala 145

GCT

Ala

GAC

Asp

TCT

Ser

AGA

Arg

AAA

Lys 225 (2)

CGC

Arg

ACT

Thr 130

CTT

Leu 100 105 110 GGA GGT GAA AAA TTG AAA ATT ACT GGA TGG GGA TTG ACC AAT Gly Gly Glu Lys Leu Lys Ile Thr Gly Trp Gly Leu Thr Asn 115 120 125 CAT GGT GAA GTT CCT GAT GCT CTT CAA GAG TTA CAG GTA GAA His Gly Glu Val Pro Asp Ala Leu Gin Glu Leu Gin Val Glu 135 140 TCT AAC TCT AAA TGC AAG GCA ATT ACT GGT GTC CAT CTT CCT Ser Asn Ser Lys Cys Lys Ala Ile Thr Gly Val His Leu Pro 150 155 160 CTC TGC ACC TTC AAA GCA CCT CAA AAG GGT GTA TGC ATG GGT Leu Cys Thr Phe Lys Ala Pro Gin Lys Gly Val Cys Met Gly 165 170 175 GGT GGT CCT CTG GTC NAT AAG GGC AAG CAA GTT GGA GTC ACA Gly Gly Pro Leu Val Xaa Lys Gly Lys Gin Val Gly Val Thr 180 185 190 GTC TGG GAA GGT TGT GCT TTG GGC AAC CCT GAT TTC TTT ACA Val Trp Glu Gly Cys Ala Leu Gly Asn Pro Asp Phe Phe Thr 195 200 205 TCG CTT TAT GTA GAC TGG GTC AAA AAG ATT CAA AAA GAA TAT Ser Leu Tyr Val Asp Trp Val Lys Lys Ile Gin Lys Glu Tyr 215 220 528 576 624 672 Ile 1 Ser Met Leu Tyr INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 225 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Val Asn Gly Val Asn Ala Lys Asn Gly 5 10 Leu Arg Asp Val Met Glu Thr Ile Ser 25 Asn Arg Trp Ile Leu Thr Ala Ala His 40 Asp Thr Val Tyr Val Gly Ser Asn His 50 55 Tyr Asn Val Glu Glu Gin Val Ile His 70 NO: 13: Ser Ala Pro Tyr Met Ala Val Glu His Arg Tyr Tr.p Cys Leu Thr Asp Gly Tyr Leu Ser Gly Asp Gly Glu Asp Lys Tyr Phe Gly Gin 75 WO 97/40058 WO 9740058PCTIEJS97/06121 180 Thr Thr Gly Phe Lys Asn Asp Ile Ala Leu 90 Val Lys Val Ser Ser Ala Ile Lys Leu Ile Arg Gly 115 Gin Thr His Lys Asn Vai Arg Ile Lys Leu His Giu Lys Leu Lys 120 Asp Thr Gly Trp Lys Asp Phe 110 Leu Thr Asn Gin Vai Glu Giy Giu Val Ala Leu Gin 130 Leu Glu 140 Gi y Al a 145 Al a Ser Asn Ser Cys Lys Ala Ile Thr 155 Lys Val His Leu Pro 160 His Leu Cys Thr 165 Pro Lys Ala Pro Gly Val. Cys Met Giy 175 Asp Ser Giy Ser Phe Val 195 Arg Vai Ser 210 Leu Val Xaa Lys Gin Val Giu Giy Cys Al a 200 T rp Gly Asn Pro Gly Vai Thr 190 Phe Phe Thr Lys Giu Tyr Leu Tyr Val Asp 215 Val Lys Lys Ile 220 Lys 225 INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 675 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: TTTATATTCT TTTTGAATCT

ATCAGGGTTG

CTTATNGACC

GGTGCAGAGA

TGCTTCTACC

TCCCCATCCA

GGGACGAACA

TTTGAAGCCG

GTACTCTCCG

AGTAAGGCAA

CCCAAAGCAC

AGAGGACCAC

TGAGCAGGAA

TGTAACTCTT

GTAATTTTCA

TTTTTGCTAA

GTTGTTTGAC

TCGCCAGAAA

TGGGCAGCAG

TTTTGACCCA GTCTACATAA AACCTTCCCA GACGAAAGAT CAGAGTCACC CAT GCATACA GATGGACACC AGTAATTGCC GAAGAGCATC AGGAACTTCA ATTTTTCACC TCCGCGTATA GTTTTATAGC ACTAGAAACT CAAAATATTT ATCATGGATG GATGATTTGA ACCAACGTAG TAAGAATCCA GCGGTTCAT C

AGCGAAACTC

GTGACTCCAA

CCCTTTTGAG

TTGCATTTAG

CCATGAGTTT

AAATCTTTGT

TTGACGAGAG

ACTTGTTCTT

ACTGTATCTA

CAATATCGAT

TTGTAAAGAA

CTTGCTTGCC

GTGCTTTGAA

AGTTAGAAAG

GATTGGTCAA

GCAATTTGAT

CAATATCATT

CTACATTGTA

GATAACCGTC

GCTCCACAGA

120 180 240 300 360 420 480 540 600 WO 97/40058 PCTIUS97/06121 181 AATGGTTTCC ATAACATCTC TTAGAGAAGC CATATATGGA GCAGAACCGT TTTTGGCATT 660 AACTCCATTA ACAAT 675 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1089 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 33..807 (xi) SEQUENCE DESCRIPTION: SEQ ID GTTTTTAGTG CGGTGTTAAA TAAAAGTTTA AA ATG AAA CTC ATC ATC GTG CTA 53 Met Lys Leu Ile Ile Val Leu 1 GCA TTT GTT TTA GGA ATT TGT TCA GGT TCT CCA CAT TCA AGA ATA ATT 101 Ala Phe Val Leu Gly Ile Cys Ser Gly Ser Pro His Ser Arg Ile Ile 15 TGT GGT CAA AAT GCC AAA AAA AAT TCG GCT CCA TAC ATG GCA TCG GTT 149 Cys Gly Gin Asn Ala Lys Lys Asn Ser Ala Pro Tyr Met Ala Ser Val 30 CAA CTT TTA GAT AAA GTT GAA GGA GTC GAA AAA TTG TTT CAT TTT TGC 197 Gin Leu Leu Asp Lys Val Glu Gly Val Glu Lys Leu Phe His Phe Cys 40 45 50 GGA GGA GCA ATA GTT AAT GAT AGA TGG ATT TTG ACT GCT GCA CAT TGT 245 Gly Gly Ala Ile Val Asn Asp Arg Trp Ile Leu Thr Ala Ala His Cys 65 TTG AGA GGC AAA GAC CAC CTC CTG GAC AAA CTG TTC ATT GCA GTC GGC 293 Leu Arg Gly Lys Asp His Leu Leu Asp Lys Leu Phe Ile Ala Val Gly 80 CTG ACA AAT TTA GGT GAA GGA GGC ACC GTG TAT CCT GTA GAA AAA GGC 341 Leu Thr Asn Leu Gly Glu Gly Gly Thr Val Tyr Pro Val Glu Lys Gly 95 100 ATC ATG CAC GAA GAA TAT GRA CAT TAT GAC ATA GTC AAC GAT ATT GCA 389 Ile Met His Glu Glu Tyr Glu His Tyr Asp Ile Val Asn Asp Ile Ala 105 110 115 CTA ATC AAA GTC AAA TCT CCG ATA GAA TTC AAT GAA AAA GTA ACG ACT 437 Leu Ile Lys Val Lys Ser Pro Ile Glu Phe Asn Glu Lys Val Thr Thr 120 125 130 135 GTA AAA TTA GGT GAG GAT TAT GTT GGC GGA GAC GTC CAA CTT CGA TTG 485 Val Lys Leu Gly Glu Asp Tyr Val Gly Gly Asp Val Gin Leu Arg Leu 140 145 150 ACA GGA TGG GGA GTT ACG ACA AAT GAG GGA ATC GGA AGC CCG AGT CAA 533 WO 97/40058 PCT/US97/06121 182 Thr Gly Trp Gly Val Thr Thr Asn Glu Gly Ile Gly Ser Pro Ser Gin 155 160 165 AAA TTA CAG GTC ATG ACA GCC AAA TCA CTA ACT TAT GAG GAT TGC AAA 581 Lys Leu Gin Val Met Thr Ala Lys Ser Leu Thr Tyr Glu Asp Cys Lys 170 175 180 AAC GCA ATT TAT AAA AAG ACT TTC GAA AGC CAA ATT TGT GCA CAG GCT 629 Asn Ala Ile Tyr Lys Lys Thr Phe Glu Ser Gin Ile Cys Ala Gin Ala 185 190 195 AAA AAA GGA ACC GGA TCT TGT AAG GGT GAT TCT GGT GGT CCA TTA GTC 677 Lys Lys Gly Thr Gly Ser Cys Lys Gly Asp Ser Gly Gly Pro Leu Val 200 205 210 215 CAA GGA AAC AAT ACA TTG GTC GGT TTA GTA TCC TGG GGT ATG CAA CCT 725 Gin Gly Asn Asn Thr Leu Val Gly Leu Val Ser Trp Gly Met Gin Pro 220 225 230 TGT GGA AGT GGT TAT TAT CCT GAC GTT TAC ACA AGA ATT ACA TCG TTT 773 Cys Gly Ser Gly Tyr Tyr Pro Asp Val Tyr Thr Arg Ile Thr Ser Phe 235 240 245 TTG GAC TGG ATT AAC ACG ACA ATG TCA GAA AAT T AAAGAAAAAA 817 Leu Asp Trp Ile Asn Thr Thr Met Ser Glu Asn 250 255 AAATATCAAA ATAGTAGTTA AAATATTTTG TAGACTACTG TAAAAGTATT GAAATTAACA 877 AATATTTGTT TTGTATATAG GGCTGGTACC TAAAGACATT ATTATTTGTT AAAACTGATA 937 TTTATTTTTA TGACGAATTT CTATATTTAT AATATTATTT TATATGTATG TATTTGATTG 997 TATTTTAAAT ATGGACTGGT TGTTCCCAAT GTTATAAATT AAAATTATAA AAATACATCT 1057 ACTATTTTAT ACTAAAAAAA AAAAAAAAAA AA 1089 INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 258 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: Met Lys Leu Ile Ile Val Leu Ala Phe Val Leu Gly Ile Cys Ser Gly 1 5 10 Ser Pro His Ser Arg Ile Ile Cys Gly Gin Asn Ala Lys Lys Asn Ser 25 Ala Pro Tyr Met Ala Ser Val Gin Leu Leu Asp Lys Val Glu Gly Val 40 Glu Lys Leu Phe His Phe Cys Gly Gly Ala Ile Val Asn Asp Arg Trp 50 55 Ile Leu Thr Ala Ala His Cys Leu Arg Gly Lys Asp His Leu Leu Asp 70 75 WO 97/40058 WO 9740058PCT/US97/06121.

183 Lys Leu Phe Ile Ala Val Gly Leu Thr Asn His Leu Gly Glu Gly Giy Thr Val Tyr Pro Asp Ile Val 115 Phe Asn Glu 130 Gly Asp Val Glu Lys Gly Ile Met 105 Giu Giu Tyr Asp Ile Ala Leu 120 Val Ile Lys Val Lys Ser 125 Asp Giu His Tyr 110 Pro Ile Giu Tyr Val Gly Lys Val Thr Lys Leu Gly Giu 140 Val Gin Leu Thr Gly Trp Thr Thr Asn 145 Gi y Ile Gly Ser Pro 165 Asp Ser Gin Lys Leu Met Thr Ala Lys Ser 175 Leu Thr Tyr Ser Gin Ile 195 Asp Ser Gly Cys Lys Asn Al a 185 Lys Tyr Lys Lys Cys Ala Gin Ala Gly Thr Gly Thr Phe Giu 190 Cys Lys Gly Val Gly Leu Gly Pro Leu 210 Ser Val 215 Pro Gly Asn Asn Thr 220 Tyr Val 225 Tyr Trp Giy Met Cys Giy Ser Gi y 235 Tyr Pro Asp Thr Arg Ile Phe Leu Asp Ile Asn Thr Thr Met Ser 255 Giu Asn INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 1089 base pairs TYPE: nucleic acid STRP.NDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: TTTTTTTTTT TTTTTTTTTA GTATAAAATA GTAGATGTAT TTTTATAATT ACATTGGGAA CAACCAGTCC ATATTTAAAA TACAATCAAA TACATACATA TTATAAATAT AGAAATTCGT CATAAAAATA AATATCAGTT TTAACAAATA TAGGTACCAG CCCTATATAC AAAACAAATA TTTGTTAATT TCAATACTTT NACAAAATAT TTTAACTACT ATTTTGATAN TTTTTTNTCT TTAATTTTCT TGTTAATCCA GTCCAAAAAC GATGTAATTC TTGTGTAAAC GTCAGGATAA CACAAGGTTG CATACCCCAG GATACTAAAC CGACCAATGT ATTGTTTCCT GACCACCAGA ATCACCCTTA CAAGATCCGG TTCCTTTTTT AGCCTGTGCA

TTAATTTATA

TAAAATAATA

ATAATGTCTT

TACAGTAGTC

GACATTGTCG

TAACCACTTC

TGGACTAATG

CAAATTTGGC

120 180 240 300 360 420 480 WO 97/40058 WO 7/4058PCT/US97/06121.

184

TTTCGAAAGT

TCATGACCTG

CTGTCAATCG

CTTTTTCATT

CATAATGTTC

CACCTAAATT

TCAAACAATG

GAAACAATTT

CCGAATTTTT

TTCCTAAAAC

ACTAAAAAC

CTTTTTATAA

TAATTTTTGA

AAGTTGGACG

GAATTCTATC

AT.ATTCTTCG

TGTCAGGCCG

TGCAGCAGTC

TTCGACTCCT

TTTGGCATTT

AAATGCTAGC

ATTGCGTTTT TGCAATCCTC CTCGGGCTTC CGATTCCCTC TCTCCGCCAA CATAATCCTC GGAGATTTGA CTTTGATTAG TGCATGATGC CTTTTTCTAC ACTGCAATGA ACAGTTTGTC AAAATCCATC TAT CATTAAC TC.AACTTTAT CTAAAAGTTG TGACCACAAA TTATTCTTGA ACGATGATGA GTTTCATTTT

ATAAGTTAGT

ATTTGTCGTA

ACCTAATTTT

TGCAATATCG

AGGATACACG

CAGGAGGTGG

TATTGCTCCT

AACCGATGCC

ATGTGGAGAA

AAACTTTTAT

GATTTGGCTG

ACTCCCCATC

ACAGTCGTTA

TTGACTATGT

GTGCCTCCTT

TCTTTGCCTC

CCGCAAAAAT

ATGTATGGAG

CCTGAACAAA

TTAACACCGC

540 600 660 720 780 840 900 960 1020 1080 1089 INFORMA@TION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 774 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: l. .774 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: ATG AAA CTC ATC ATC GTG CTA GCA TTT GTT TTA GGA ATT TGT TCA GGT Met Lys Leu Ile Ile Val Leu Ala Phe Val Leu Gly Ile Cys Ser Gly 1 5 10 TCT_ CCA CAT TCA AGA ATA ATT TGT GGT CAA AAT GCC AAA AAA AAT TCG Ser Pro His Ser Arg Ile Ile Cys Gly Gin Asn Ala Lys Lys Asn Ser 25 GCT CCA TAC ATG GCA TCG GTT CAA CTT TTA GAT AAA GTT GAA GGA GTC.

Ala Pro Tyr Met Ala Ser Val Gln Leu Leu Asp Lys Val Glu Gly Val 40 GAA AAA TTG TTT CAT TTT TGC GGA GGA GCA ATA GTT AAT GAT AGA TGG Glu Lys Leu Phe His Phe Cys Gly Gly Ala Ile Val Asn Asp Arg Trp 55 ATT TTG ACT GCT GCA CAT TGT TTG AGA GGC AAA GAC CAC CTC CTG GAC Ile Leu Thr Ala Ala His Cys Leu Arg Gly Lys Asp His Leu Leu Asp 70 75 AAA CTG TTC ATT GCA GTC GGC CTG ACA AAT TTA GGT GAA GGA GGC ACC Lys Leu Phe Ile Ala Val Gly Leu Thr Asn Leu Gly Glu Gly Gly Thr 85 90 WO 97/40058 PCTUS97/06121 185 GTG TAT CCT GTA GAA AAA GGC ATC ATG CAC GAA GAA TAT GAA CAT TAT 336 Val Tyr Pro Val Glu Lys Gly Ile Met His Glu Glu Tyr Glu His Tyr 100 105 110 GAC ATA GTC AAC GAT ATT GCA CTA ATC AAA GTC AAA TCT CCG ATA GAA 384 Asp Ile Val Asn Asp Ile Ala Leu Ile Lys Val Lys Ser Pro Ile Glu 115 120 125 TTC AAT GAA AAA GTA ACG ACT GTA AAA TTA GGT GAG GAT TAT GTT GGC 432 Phe Asn Glu Lys Val Thr Thr Val Lys Leu Gly Glu Asp Tyr Val Gly 130 135 140 GGA GAC GTC CAA CTT CGA TTG ACA GGA TGG GGA GTT ACG ACA AAT GAG 480 Gly Asp Val Gin Leu Arg Leu Thr Gly Trp Gly Val Thr Thr Asn Glu 145 150 155 160 GGA ATC GGA AGC CCG AGT CAA AAA TTA CAG GTC ATG ACA GCC AAA TCA 528 Gly Ile Gly Ser Pro Ser Gin Lys Leu Gin Val Met Thr Ala Lys Ser 165 170 175 CTA ACT TAT GAG GAT TGC AAA AAC GCA ATT TAT AAA AAG ACT TTC GAA 576 Leu Thr Tyr Glu Asp Cys Lys Asn Ala Ile Tyr Lys Lys Thr Phe Glu 180 185 190 AGC CAA ATT TGT GCA CAG GCT AAA AAA GGA ACC GGA TCT TGT AAG GGT 624 Ser Gin Ile Cys Ala Gin Ala Lys Lys Gly Thr Gly Ser Cys Lys Gly 195 200 205 GAT TCT GGT GGT CCA TTA GTC CAA GGA AAC AAT ACA TTG GTC GGT TTA 672 Asp Ser Gly Gly Pro Leu Val Gin Gly Asn Asn Thr Leu Val Gly Leu 210 215 220 GTA TCC TGG GGT ATG CAA CCT TGT GGA AGT GGT TAT TAT CCT GAC GTT 720 Val Ser Trp Gly Met Gin Pro Cys Gly Ser Gly Tyr Tyr Pro Asp Val 225 230 235 240 TAC ACA AGA ATT ACA TCG TTT TTG GAC TGG ATT AAC ACG ACA ATG TCA 768 Tyr Thr Arg Ile Thr Ser Phe Leu Asp Trp Ile Asn Thr Thr Met Ser 245 250 255 GAA AAT 774 Glu Asn INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 258 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: Met Lys Leu Ile Ile Val Leu Ala Phe Val Leu Gly Ile Cys Ser Gly 1 5 10 Ser Pro His Ser Arg Ile Ile Cys Gly Gin Asn Ala Lys Lys Asn Ser 25 WO 97/40058 PCT/US97/06121.

186 Ala Pro Tyr Met Ala Ser Val Gln Leu Leu Asp Lys Val Glu Gly Val 40 Glu Lys Leu Phe His Phe Cys Gly Gly Ala Ile Val Asn Asp Arg Trp 55 Ile Leu Thr Ala Ala His Cys Leu Arg Gly Lys Asp His Leu Leu Asp 70 75 Lys Leu Phe Ile Ala Val Gly Leu Thr Asn Leu Gly Glu Gly Gly Thr 90 Val Tyr Pro Val Glu Lys Gly Ile Met His Glu Glu Tyr Glu His Tyr 100 105 110 Asp Ile Val Asn Asp Ile Ala Leu Ile Lys Val Lys Ser Pro Ile Glu 115 120 125 Phe Asn Glu Lys Val Thr Thr Val Lys Leu Gly Glu Asp Tyr Val Gly 130 135 140 Gly Asp Val Gin Leu Arg Leu Thr Gly Trp Gly Val Thr Thr Asn Glu 145 150 155 160 Gly Ile Gly Ser Pro Ser Gin Lys Leu Gin Val Met Thr Ala Lys Ser 165 170 175 Leu Thr Tyr Glu Asp Cys Lys Asn Ala Ile Tyr Lys Lys Thr Phe Glu 180 185 190 Ser Gin Ile Cys Ala Gin Ala Lys Lys Gly Thr Gly Ser Cys Lys Gly 195 200 205 Asp Ser Gly Gly Pro Leu Val Gin Gly Asn Asn Thr Leu Val Gly Leu 210 215 220 Val Ser Trp Gly Met Gin Pro Cys Gly Ser Gly Tyr Tyr Pro Asp Val 225 230 235 240 Tyr Thr Arg Ile Thr Ser Phe Leu Asp Trp Ile Asn Thr Thr Met Ser 245 250 255 Glu Asn INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 774 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ATTTTCTGAC ATTGTCGTGT TAATCCAGTC CAAAAACGAT GTAATTCTTG TGTAAACGTC AGGATAATAA CCACTTCCAC AAGGTTGCAT ACCCCAGGAT ACTAAACCGA CCAATGTATT 120 GTTTCCTTGG ACTAATGGAC CACCAGAATC ACCCTTACAA GATCCGGTTC CTTTTTTAGC 180 WO 97/40058 WO 9740058PCTIUS97/06121 187

CTGTGCACAA

AGTTAGTGAT

TGTCGTAACT

TAATTTTACA

AATATCGTTG

ATACACGGTG

GAGGTGGTCT

TGCTCCTCCG

CGATGCCATG

TGGAGAACCT

ATTTGGCTTT

TTGGCTGTCA

CCCCATCCTG

GTCGTTACTT

ACTATGTCAT

CCTCCTTCAC

TTGCCTCTCA

CAAAAATGAA

TATGGAGCCG

GAACAAATTC

CGAAAGTCTT

TGACCTGTAA

TCAATCGAAG

TTTCATTGAA

AATGTTCATA

CTAAATTTGT

AACAATGTGC

ACAATTTTTC

AATTTTTTTT

TTTATAAATT

TTTTTGACTC

TTGGACGTCT

TTCTATCGGA

TTCTTCGTGC

CAGGCCGACT

AGCAGTCAAA

GACTCCTTCA

GGCATTTTGA

GCGTTTTTGC

GGGCTTCCGA

CCGCCAACAT

GATTTGACTT

ATGATGCCTT

GCAATGAACA

ATCCATCTAT

ACTTTATCTA

CCACAAATTA

AATCCTCATA

TTCCCTCATT

AAT CCT CAC C

TGATTAGTGC

TTTCTACAGG

GTTTGTCCAG,

CATTAACTAT

AAAGTTGAAC

TTCTTGAATG

CTAAAACAAA TGCTAGCACG ATGATGAGTT TCAT INFORMATION FOR SEQ ID NO:2l: SEQUENCE CHARACTERISTICS: LENGTH: 711 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1. .711 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: ATA ATT TGT GGT CAA AAT GCC AAA AAA AAT TCG GCT CCA TAC ATG GCA Ile Ile Cys Gly Gin Asn Ala Lys Lys Asn Ser Ala Pro Tyr Met Ala 1 5 10 TCG GTT CAA CTT TTA GAT AAA GTT GAA GGA GTC GAA AATTG TTT CAT Ser Val Gin Leu Leu Asp Lys Vai Giu Gly Val Giu Lys Leu Phe His 25 TTT TGC GGA GGA GCA ATA GTT AAT GAT AGA TGG ATT TTG ACT GCT GCA Phe Cys Gly Gly Ala Ile Vai Asn Asp Arg Trp Ile Leu Thr Ala Ala 35 40 CAT TGT TTG AGA GGC AAA GAC CAC CTC CTG GAC AAA CTG TTC ATT GCA His Cys Leu Arg Giy Lys Asp His Leu Leu Asp Lys Leu Phe Ile Ala 55 GTC GGC CTG ACA PAT TTA GGT GAA GGA GGC ACC GTG TAT CCT GTA GAA Vai Gly Leu Thr Asn Leu Gly Glu Gly Giy Thr Val Tyr Pro Val Giu 70 75 AAA GGC ATC ATG CAC GAA GAA TAT GAA CAT TAT GAC ATA GTC AAC GAT Lys Gly Ile Met His Giu Giu Tyr Glu His Tyr Asp Ile Val Asn Asp 90 ATT GCA CTA ATC AAA GTC AAA TCT CCG ATA GAA TTC AAT GAA AAA GTA le Ala Leu Ile Lys Val Lys Ser Pro Ile Glu Phe Asn Giu Lys Vai 48 96 144 192 240 288 336 WO 97/40058 PCT/US97/06121 188 100 105 110 ACG ACT GTA AAA TTA GGT GAG GAT TAT GTT GGC GGA GAC GTC CAA CTT 384 Thr Thr Val Lys Leu Gly Glu Asp Tyr Val Gly Gly Asp Val Gin Leu 115 120 125 CGA TTG ACA GGA TGG GGA GTT ACG ACA AAT GAG GGA ATC GGA AGC CCG 432 Arg Leu Thr Gly Trp Gly Val Thr Thr Asn Glu Gly Ile Gly Ser Pro 130 135 140 AGT CAA AAA TTA CAG GTC ATG ACA GCC AAA TCA CTA ACT TAT GAG GAT 480 Ser Gin Lys Leu Gin Val Met Thr Ala Lys Ser Leu Thr Tyr Glu Asp 145 150 155 160 TGC AAA AAC GCA ATT TAT AAA AAG ACT TTC GAA AGC CAA ATT TGT GCA 528 Cys Lys Asn Ala Ile Tyr Lys Lys Thr Phe Glu Ser Gin Ile Cys Ala 165 170 175 CAG GCT AAA AAA GGA ACC GGA TCT TGT AAG GGT GAT TCT GGT GGT CCA 576 Gin Ala Lys Lys Gly Thr Gly Ser Cys Lys Gly Asp Ser Gly Gly Pro 180 185 190 TTA GTC CAA GGA AAC AAT ACA TTG GTC GGT TTA GTA TCC TGG GGT ATG 624 Leu Val Gin Gly Asn Asn Thr Leu Val Gly Leu Val Ser Trp Gly Met 195 200 205 CAA CCT TGT GGA AGT GGT TAT TAT CCT GAC GTT TAC ACA AGA ATT ACA 672 Gin Pro Cys Gly Ser Gly Tyr Tyr Pro Asp Val Tyr Thr Arg Ile Thr 210 215 220 TCG TTT TTG GAC TGG ATT AAC ACG ACA ATG TCA GAA AAT 711 Ser Phe Leu Asp Trp Ile Asn Thr Thr Met Ser Glu Asn 225 230 235 INFORMATION FOR SEQ ID NO:22: SEQUENCE CHARACTERISTICS: LENGTH: 237 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: Ile Ile Cys Gly Gin Asn Ala Lys Lys Asn Ser Ala Pro Tyr Met Ala 1 5 10 Ser Val Gin Leu Leu Asp Lys Val Glu Gly Val Glu Lys Leu Phe His 25 Phe Cys Gly Gly Ala Ile Val Asn Asp Arg Trp Ile Leu Thr Ala Ala 40 His Cys Leu Arg Gly Lys Asp His Leu Leu Asp Lys Leu Phe Ile Ala 50 55 Val Gly Leu Thr Asn Leu Gly Glu Gly Gly Thr Val Tyr Pro Val Glu 70 75 WO 97/40058 PCT/US97/06121 189 Lys Gly Ile Met His Glu Glu Tyr Glu His Tyr Asp Ile Val Asn Asp 90 Ile Ala Leu Ile Lys Val Lys Ser Pro Ile Glu Phe Asn Glu Lys Val 100 105 110 Thr Thr Val Lys Leu Gly Glu Asp Tyr Val Gly Gly Asp Val Gin Leu 115 120 125 Arg Leu Thr Gly Trp Gly Val Thr Thr Asn Glu Gly Ile Gly Ser Pro 130 135 140 Ser Gin Lys Leu Gin Val Met Thr Ala Lys Ser Leu Thr Tyr Glu Asp 145 150 155 160 Cys Lys Asn Ala Ile Tyr Lys Lys Thr Phe Glu Ser Gin Ile Cys Ala 165 170 175 Gin Ala Lys Lys Gly Thr Gly Ser Cys Lys Gly Asp Ser Gly Gly Pro 180 185 190 Leu Val Gin Gly Asn Asn Thr Leu Val Gly Leu Val Ser Trp Gly Met 195 200 205 Gin Pro Cys Gly Ser Gly Tyr Tyr Pro Asp Val Tyr Thr Arg Ile Thr 210 215 220 Ser Phe Leu Asp Trp Ile Asn Thr Thr Met Ser Glu Asn 225 230 235 INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 924 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..800 OTHER INFORMATION: /note= "At pos. bp 132, change A to N; at pos. bp 202, change A to N; at pos. bp 780, change A to N.

At pos. aa 44, 67 and 260, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: AA ATT TTA TTA CTG GTA TTA TTG GCA GTA TGC TTT GCT TCA GCT AAA 47 Ile Leu Leu Leu Val Leu Leu Ala Val Cys Phe Ala Ser Ala Lys 1 5 10 CGA GGT CCA CGA AAA CAT GTT CGC GAA ACA CAA AAA AGT CTT GCC TCT Arg Gly Pro Arg Lys His Val Arg Glu Thr Gin Lys Ser Leu Ala Ser 20 25 GGG CGT ATT GTG GGT GGT GAA GCA GTG AGC ATT GAA NAC TAT GGA TGG 143 Gly Arg Ile Val Gly Gly Glu Ala Val Ser Ile Glu Xaa Tyr Gly Trp 40 WO 97/40058 PCTIUS97/06121 190 CAA GTT TCT CTA CAA CGT TTT GGC AGT CAT TTC TGT GGA GGA TCT ATA 191 Gin Val Ser Leu Gin Arg Phe Gly Ser His Phe Cys Gly Gly Ser Ile 55 ATA TCC AGT ANA TGG ATT CTT TCA GCT GCT CAT TGC TTT TAT GGA ACG 239 Ile Ser Ser Xaa Trp Ile Leu Ser Ala Ala His Cys Phe Tyr Gly Thr 70 TTA TTT CCG ATT GGA TTC TCT GCG AGA GCC GGC AGC AGT ACT GTG AAT 287 Leu Phe Pro Ile Gly Phe Ser Ala Arg Ala Gly Ser Ser Thr Val Asn 85 90 TCA GGA GGA ACT GTG CAT ACA ATT TTG TAT TGG TAT ATT CAT CCA AAT 335 Ser Giy Gly Thr Val His Thr Ile Leu Tyr Trp Tyr Ile His Pro Asn 100 105 110 TAT GAT TCA CAA AGT ACA GAC TTT CAT GTT TCT GTA GTT CGA CTA TTA 383 Tyr Asp Ser Gin Ser Thr Asp Phe Asp Vai Ser Val Val Arg Leu Leu 115 120 125 TCT TCT TTA PAT TTG PAT GGA GGT TCT ATT CGA CCG GCT AGG TTA GTG 431 Ser Ser Leu Asn Leu Asn Gly Gly Ser Ile Arg Pro Ala Arg Leu Vai 130 135 140 CAT TCT GGA ACT CAT TTG CCA GCC GGT GAG ATG GTT ACA GTA ACT GGA 479 Asp Ser Gly Thr Asp Leu Pro Ala Giy Giu Met Val Thr Vai Thr Gly 145 150 155 TGG GCA CCA CTT TCG GAA PAT ACT TCT GTT CCC TCG CCA TCA ACT CTT 527 Trp Gly Arg Leu Ser Giu Asn Thr Ser Val Pro Ser Pro Ser Thr Leu 160 165 170 175 CAA GCA GTT ACA GTA CCA. GTT GTA AGT AAT TCG GAA TGT CAA CAA CAA 575 Gin Gly Val Thr Vai Pro Val Val Ser Asn Ser Giu Cys Gin Gin Gin 180 185 190 TTG CAA PAT CAG ACA ATC ACT GAC PAT ATG TTT TGT GCT GGT GAA TTA 623 Leu Gin Asn Gin Thr Ile Thr Asp Asn Met Phe Cys Ala Gly Giu Leu 195 200 205 GAA GCA GCA AAG GAC TCT TGT CAA GGA GAC AGT AGT GGT CCC ATG GTT 671 Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Ser Gly Pro Met Val 210 215 220 GAC AGC GAG GAT ACT CAA GTA GGA ATT GTA TCC TGG GGA ATA GGA TGT 719 Asp Ser Giu Asp Thr Gin Val Gly Ile Val Ser Trp Gly Ile Gly Cys 225 230 235 GCT ACA CCC PAT TTA CCA GGA GTT TAT ACG CGA ATT GCT TCA TCG CCA 767 Ala Arg Pro Asn Leu Pro Gly Val Tyr Thr Arg Ile Ala Ser Ser Pro 240 245 250 255 ATT ACA CAT TTC NTA AGA CCA ATA ACC GGA GTT TAATATTATT TTATACATTT 820 Ile Arg Asp Phe Xaa Arg Arg Ile Thr Gly Vai 260 265 TTCACAAATA TGAGAACTAA TGAGAACTGT TGTATTGCTA TAATTCTTTG CAACATTGTG 880 CATGAATAAA TTATGPATAT AATTGTTAAA AAAAAAAAAA AAAA 924 INFORMATION FOR SEQ ID NO:24: WO 97/40058 WO 714058PCTIUS97/06121 191 SEQUENCE CHARACTERISTICS: LENGTH: 266 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: Ile Leu Leu Leu Val Leu Leu Aia Val Cys Phe Aia Ser Ala Lys Arg 1 5 10 Gly Pro Arg Lys His Val Arg Giu Thr Gin Lys Ser Leu Ala Ser Gly 20 25 Arg Ile Val Gly Gly Glu Ala Val Ser Ile Glu Xaa Tyr Gly Trp Gin 40 Val Ser Leu Gin Arg Phe Gly Ser His Phe Cys Gly Giy Ser Ile Ile 55 Ser Ser Xaa Trp Ile Leu Ser Ala Ala His Cys Phe Tyr Gly Thr Leu 70 75 Phe Pro Ile Gly Phe Ser Ala Arg Ala Gly Ser Ser Thr Val Asn Ser 90 Gly Gly Thr Val His Thr Ile Leu Tyr Trp Tyr Ile His Pro Asn Tyr 100 105 110 Asp Ser Gin Ser Thr Asp Phe Asp Val Ser Val Val Arg Leu Leu Ser 115 120 125 Ser Leu Asn Leu Asn Giy Gly Ser Ile Arg Pro Ala Arg Leu Val Asp 130 135 140 Ser Giy Thr Asp Leu Pro Ala Giy Giu Met Val Thr Vai Thr Giy Trp 145 150 155 160 Gly Arg Leu Ser Giu Asn Thr Ser Vai Pro Ser Pro Ser Thr Leu Gin 165 170 175 Gly Vai Thr Val Pro Vai Vai Ser Asn Ser Glu Cys Gin Gin Gin Leu 180 185 190 Gin Asn Gin Thr Ile Thr Asp Asn Met Phe Cys Ala Gly Giu Leu Giu 195 200 205 Gly Gly Lys Asp Ser Cys Gin Giy Asp Ser Ser Gly Pro Met Val Asp 210 215 220 Ser Giu Asp Thr Gin Vai Gly Ile Vai Ser Trp Gly Ile Gly Cys Ala 225 230 235 240 Arg Pro Asn Leu Pro Gly Val Tyr Thr Arg Ile Ala Ser Ser Pro Ile 245 250 255 Arg Asp Phe Xaa Arg Arg Ile Thr Gly Val 260 265 INFORMALTION FOR SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 192 SEQUENCE CHARACTERISTICS: LENGTH: 924 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID TTTTTTTTTT TTTTTTTAAC AATTANATTC ATAATTTATT CATGCACAAT

ATTATAGCAA

ATTAAACTCC

AAACTCCTGG

TATCCTCGCT

CTAATTCACC

ATTCCGAATT

AAGTATTTTC

CAGTTCCAGA

ATAATAGTCG

TATACCAATA

TCGCAGAGAA

ATNTACTGGA

GCCATCCATA

TTTTTTGTGT

CCAATAATAC

NACAACAGTT

GGTTATTCGT

TAP.ATTGGGT

GTCAACCATG

AGCACAAAAC

ACTTACAACT

CGAAAGTCGT

ATCCACTAAC

AACTACAGAA

CAAAATTGTA

TCCAATCGGA

TATTATAGAT

GTNTTCAATG

TTCGCGAACA

CAGTAATAAA

CTCATTAGTT

CTTANGAAAT

CTAGCACATC

GGACCACTAC

ATATTGTCAG

GGTACTGTAA

CCCCATCCAG

CTAGCCGGTC

ACATCAAAGT

TGCACAGTTC

AATAACGTTC

CCTCCACAGA

CTCACTGCTT

TGTTTTCGTG

ATTT

CTCATATTTG

CTCTAATTGG

CTATTCCCCA

TGTCTCCTTG

TGATTGTCTG

CTCCTTGAAG

TTACTGTAAC

GAATAGAACC

CTGTACTTTG

CTCCTGAATT

CATAAAAGCA

AATGACTGCC

CACCACCCAC

GACCTNGTTT

TCAAAAATGT

CGATGAAGCA

GGATACAATT

ACAAGAGTC C

ATTTTGCAALT

AGTTGATGGC

CATCTCACCG

TCCATTCAAA

TGAATCATAA

CACAGTACTG

ATGAGCAGCT

AAAACGTTGT

AATACGC CCA

AGCTGAAGCA

GTTGCAAAGA

ATAAAATAAT

ATTCGCGTAT

CCTACTTGAG

TTTCCTCCTT

TGTTGTTGAC

GAGGGAACAG

GCTGGCAAAT

TTTAAAGAAG

TTTGGATGAA

CTGCCGGCTC

GAAAGAATCC

AGAGAAACTT

GAGGCAAGAC

AAGCATACTG

120 180 240 300 360 420 480 540 600 660 720 780 840 900 924 INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 699 base pairs TYPE: nucleic acid STRP.NDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 699 OTHER INFORMATION: /note= "At pos. bp 31, change A to N; at pos. bp 101, change A to N; at pos. bp 679, change A to N.

At pos. aa 11, 34 and 227, substitute Xaa."1 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: ATT GTG GGT GGT GAA GCA GTG AGC ATT GAA NAC TAT GGA TGG CAA GTT WO 97/40058 PCT/US97/06121.

193 Ile Val Gly Gly Giu Ala Val Ser Ile Giu Xaa Tyr Giy Trp Gin Vai 1 5 10. TCT CTA CAA CGT TTT GGC AGT CAT TTC TGT GGA GGA TCT ATA ATA TCC 96 Ser Leu Gin Arg Phe Gly Ser His Phe Cys Gly Giy Ser Ile Ile Ser 20 25 AGT ANA TGG ATT CTT TCA GCT GCT CAT TGC TTT TAT GGA ACG TTA TTT 144 Ser Xaa Trp Ile Leu Ser Ala Ala His Cys Phe Tyr Gly Thr Leu Phe 40 CCG ATT GGA TTC TCT GCG AGA GCC GGC AGC AGT ACT GTG AAT TCA GGA 192 Pro Ile Giy Phe Ser Ala Arg Ala Gly Ser Ser Thr Val Asn Ser Gly 55 GGA ACT GTG CAT ACA ATT TTG TAT TGG TAT ATT CAT CCA AAT TAT GAT 240 Gly Thr Val His Thr Ile Leu Tyr Trp Tyr Ile His Pro Asn Tyr Asp 70 75 TCA CAA AGT ACA GAC TTT GAT GTT TCT GTA GTT CGA CTA TTA TCT TCT 288 Ser Gin Ser Thr Asp Phe Asp Val Ser Val Val Arg Leu Leu Ser Ser 90 TTA AAT TTG PAT GGA GGT TCT ATT CGA CCG GCT AGG TTA GTG GAT TCT 336 Leu Asn Leu Asn Gly Gly Ser Ile Arg Pro Ala Arg Leu Val Asp Ser 100 105 110 GGA ACT GAT TTG CCA GCC GGT GAG ATG GTT ACA GTA ACT GGA TGG GGA 384 Gly Thr Asp Leu Pro Ala Gly Giu Met Val Thr Val Thr Gly Trp Gly 115 120 125 CGA CTT TCG GAA AAT ACT TCT GTT CCC TCG CCA TCA ACT CTT CAA GGA 432 Arg Leu Ser Giu Asn Thr Ser Vai Pro Ser Pro Ser Thr Leu Gin Gly 130 135 140 GTT ACA GTA CCA GTT GTA AGT PAT TCG GAA TGT CAA CAA CAA TTG CAA 480 Val Thr Val Pro Vai Vai Ser Asn Ser Giu Cys Gin Gin Gin Leu Gin 145 150 155 160 PAT CAG ACA ATC ACT GAC PAT ATG TTT TGT GCT GGT GAA TTA GPA GGA 528 Asn Gin Thr Ile Thr Asp Asn Met Phe Cys Ala Gly Giu Leu Giu Gly 165 170 175 GGA PAG GAC TCT TGT CPA GGA GAC AGT AGT GGT CCC ATG GTT GAC AGC 576 Gly Lys Asp Ser Cys Gin Gly Asp Ser Ser Gly Pro Met Val Asp Ser 180 185 190 GAG GAT ACT CAA GTA GGA ATT GTA TCC TGG GGA ATA GGA TGT GCT AGA 624 Glu Asp Thr Gin Val Gly Ile Val Ser Trp Gly Ile Gly Cys Ala Arg 195 200 205 CCC PAT TTA CCA GGA GTT TAT ACG CGA ATT GCT TCA TCG CCA ATT AGA 672 Pro Asn Leu Pro Gly Val Tyr Thr Arg Ile Ala Ser Ser Pro Ile Arg 210 215 220 GAT TTC NTA AGA CGA ATA ACC GGA GTT 699 Asp Phe Xaa Arg Arg Ile Thr Giy Val 225 230 INFORMATION FOR SEQ ID NO:27: WO 97/40058 WO 9740058PCTf[JS97/06121 194 SEQUENCE CHARACTERISTICS: LENGTH: 233 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein Ile 1 Ser Ser Pro Gly Ser Leu Gly Arg Val 145 Asn Gi y Glu Pro Asp 225 (xi) SEQUENCE Val Gly Gly Giu 5 Leu Gin Arg Phe 20 Xaa Trp Ile Leu Ile Gly Phe Ser Thr Val His Thr Gin Ser Thr Asp Asn Leu Asn Giy 100 Thr Asp Leu Pro 115 Leu Ser Glu Asn 130 Thr Vai Pro Vai Gin Thr Ile Thr 165 Lys Asp Ser Cys 180 Asp Thr Gin Val 195 Asn Leu Pro Gly 210 Phe Xaa Arg Arg DESCRIPTION: SEQ ID NO:27: Ala Vai Ser Ile G Gi y Ser Al a Ile 70 Phe Gi y Ala Thr Val 150 Asp Gin Gi y Val1 Ile 230 Ser Al a Arg 55 Leu Asp Ser Gi y Ser 135 Ser Asn Gi y Ile Tyr 215 Thr His Al a 40 Al a Tyr Val Ile Giu 120 Val Asn Met Asp Val 200 Thr Gi y Phe 25 His Gi y T rp Ser Arg 105 Met Pro Ser Phe Ser 185 Ser Arg Val

C

S

T

V

P

V

S

C.

C

S

iu Xaa 10 ys Gly .ys Phe er Ser yr Ile 75 al Val 90 ~ro Ala al Thr er Pro liu Cys 155 :ys Ala .70 ~er Gly ~rp Gly lie Ala Tyr Gi y Tyr Thr His Arg Arg Val Ser 140 Gin Gly Pro Ile Ser 220 Gi y Ser Gi y Val Pro Leu Leu Thr 125 Thr Gin Giu Met Gi y 205 Ser T rp Ile Th r Asn Asn Leu Val 110 Gi y Leu Gin Leu Val 190 Cys Pro Gin Ile Leu Ser Tyr Ser Asp Trp Gin Leu Glu 175 Asp Al a Ile Val Ser Phe Gi y Asp Ser Ser Gi y Gi y Gin 160 Gl y Ser Arg Arg INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 699 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 97/40058 WO 9740058PCTIUS97/06121.

195 (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: AACTCCGGTT ATTCGTCTTA NGAAATCTCT AATTGGCGAT GAAGCAATTC

TCCTGGTAAA

CTCGCTGTCA

TTCACCAGCA

CGAATTACTT

ATTTTCCGAA

TCCAGAATCC

TAGTCGAACT

CCAATACAAA

AGAGAATCCA

ACTGGATATT

TCCATAGTNT

TTGGGTCTAG

ACCATGGGAC

CAAAACATAT

ACAACTGGTA

AGTCGTCCCC

ACTAACCTAG

ACAGAAACAT

ATTGTATGCA

AT CGGAAATA

ATAGATCCTC

TCAATGCTC.A

CACATCCTAT

CACTACTGTC

TGTCAGTGAT

CTGTAACTCC

ATCCAGTTAC

CCGGTCGAAT

CAAAGTCTGT

CAGTTCCTCC

TCCCCAGGAT

TCCTTGACAA

TGTCTGATTT

TTGAAGAGTT

TGTAACCATC

AGAACCTCCA

ACTTTGTGAA

TGAATTCACA

ACAATTCCTA

GAGTCCTTTC

TGCAATTGTT

GATGGCGAGG

TCACCGGCTG

TTCAAATTTA

TCATAATTTG

GTACTGCTGC

GCAGCTGAPA

GCGTATAAAC

CTTGAGTATC

CTCCTTCTAA

GTTGACATTC

GAACAGAAGT

GCAAATCAGT

AAGAAGATAA

GATGAATATA

CGGCTCTCGC

GAATCCATNT

ACGTTCCATA AAAGCAATGA CACAGAAATG ACTGCCAAAA CTGCTTCACC ACCCACAAT CGTTGTAGAG AAACTTGCCA INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 1894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 335. .1535 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: ACGCGACGGG CAGTCTCTTT CAGACCGCGG CCGAACGATG TTTTTGACGG, TTTAATTTCA.

ATTTTTGCAA CTTGTGACTT CGAATACACC CGTACGTGAC CTATCCACTT TACCATCGGC CGACCGTGAA AGTTGTGTTT GTGCTATTGA AAATTTCGTG CTCAATAATA AATATTCAGT TTTTTTGTCT AATCAGGATA TTTAAATTTA TGTGTACAAG TGTTAAACGC AATCTCGTCG CTCTTCCTAA TTTTCTTGTT GCCAATGCTG GCCTCAGTGC ATCGGCCGAC GCAAATGTGC TTGCAAAATA GAGAAATCCG GTGAAATCAC ACAT ATG TTA GCG ATC GTC CCG Met Leu Ala Ile Val Pro 1 TCA AAC GGA GCG TTC GCA GAC CAT GCC AAC CTT GGT GGA GTT GAT GGT Ser Asn Gly Ala Phe Ala Asp His Ala Asn Leu Gly Gly Val Asp Gly 15 WO 97/40058 PCTIUS97/06121

CTT

Leu

GCG

Ala

GAG

Glu

ACG

Thr

GGA

Gly

TGC

Cys

CTG

Leu

CGG

Arg 135

GAA

Glu

TTC

Phe

AGT

Ser

ATA

Ile

GTG

Val 215

CAC

His

CCA

Pro 196 TCT GGT TTG ATT CTG GTC GCT GTT GCG ATA TCT TCG ATT GGA TAT Ser Gly Leu Ile Leu-Val Ala Val Ala Ile Ser Ser Ile Gly Tyr 30 GAC GCG GCG AAC GTT GCG CAG GAC GGA CAT CCG TCC AGC CAG CAA Asp Ala Ala Asn Val Ala Gin Asp Gly His Pro Ser Ser Gin Gin 45 CAG GAG ATC CTG CTG CTG AAT GCC TTA GCT CGC AGG AAC GGA GCG Gin Glu Ile Leu Leu Leu Asn Ala Leu Ala Arg Arg Asn Gly Ala 60 65 GGG CAC CAA TTT GAC GTA GAT CAA GAT TCA ATT ATG GAT ATG CTA Gly His Gin Phe Asp Val Asp Gin Asp Ser Ile Met Asp Met Leu 80 AGA ATG ATA CCT CAG ACT TGC CGG TAC AAA GGC GAA CGG TTC GAG Arg Met Ile Pro Gin Thr Cys Arg Tyr Lys Gly Glu Arg Phe Glu 90 95 100 GGT TTG TCA ATT TCG TGC GTC CTG GGC GGC GGA AAA CCT CTT GAC Gly Leu Ser Ile Ser Cys Val Leu Gly Gly Gly Lys Pro Leu Asp 105 110 115 TGC AGC GGC GGA ATG ATC TGG TCG TGC TGC GTC GAC AGG GAC ATT Cys Ser Gly Gly Met Ile Trp Ser Cys Cys Val Asp Arg Asp Ile 120 125 130 CCT GAG CCG CAG CAC CAG GGC GCT CTG CAG AAC GCA ACT TGT GGA Pro Glu Pro Gin His Gin Gly Ala Leu Gin Asn Ala Thr Cys Gly 140 145 150 TTG TAC ACG AGG TCT AAT AGA ATC GTA GGA GGT CAT TCA ACA GGA Leu Tyr Thr Arg Ser Asn Arg Ile Val Gly Gly His Ser Thr Gly 155 160 165 GGG TCT CAT CCT TGG CAG GCG GCT TTG ATC AAA TCA GGA TTT TTG Gly Ser His Pro Trp Gin Ala Ala Leu Ile Lys Ser Gly Phe Leu 170 175 180 AAA AAA TTA TCT TGC GGT GGC GCT TTA GTT AGC GAT CGA TGG GTT Lys Lys Leu Ser Cys Gly Gly Ala Leu Val Ser Asp Arg Trp Val 185 190 195 ACT GCT GCA CAT TGC GTT GCC ACG ACA CCA AAT TCG AAC CTG AAG Thr Ala Ala His Cys Val Ala Thr Thr Pro Asn Ser Asn Leu Lys 200 205 210 CGA TTG GGC GAA TGG GAC GTC CGC GAC CAC GAT GAG CGA CTG AAC Arg Leu Gly Glu Trp Asp Val Arg Asp His Asp Glu Arg Leu Asn 220 225 230 GAG GAA TAC GCA ATC GAA CGC AAA GAA GTT CAT CCT TCA TAT TCA Glu Glu Tyr Ala Ile Glu Arg Lys Glu Val His Pro Ser Tyr Ser 235 240 245 ACC GAT TTC CGG AAT GAT GTA GCC TTA GTG AAA CTC GAT AGA ACT Thr Asp Phe Arg Asn Asp Val Ala Leu Val Lys Leu Asp Arg Thr 250 255 260 448 496 544 592 640 688 736 784 832 880 928 976 1024 1072 1120 WO 97/40058 PCT/US97/06121 197 GTT ATT TTC AAA CAA CAT ATT TTA CCT GTC TGC TTA CCT CAT AAG CAA Val Ile Phe Lys Gin His-Ile Leu Pro Val Cys Leu Pro His Lys Gin 265 270 275 ATG AAA CTG GCT GGA AAA ATG GCA ACA GTC GCC GGA TGG GGA CGG ACG Met Lys Leu Ala Gly Lys Met Ala Thr Val Ala Gly Trp Gly Arg Thr 280 285 290

AGG

Arg 295

GAG

Glu CAC GGG CAG AGC ACT GTG CCG GCT GTC TTA CAA GAA GTC GAT GTC His Gly Gin Ser Thr Val Pro Ala Val Leu Gin Glu Val Asp Val 300 305 310 GTG ATT CCG AAT GAA AGA TGC CAG AGG TGG TTC CGT GCT GCG GGT Val Ile Pro Asn Glu Arg Cys Gin Arg Trp Phe Arg Ala Ala Gly 315 320 325 CGA GAA ACC ATT CAC GAT GTC TTT CTC TGC GCC GGA TAT AAA GAG Arg Glu Thr Ile His Asp Val Phe Leu Cys Ala Gly Tyr Lys Glu 330 335 340 GGT CGT GAT TCA TGC CAA GGT GAT TCT GGA GGT CCT CTA ATA ATG Gly Arg Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Leu Ile Met 345 350 355

CGA

Arg

GGT

Gly CAG ATT GAG GGT AGA AGG ACC CTT GTG GGT CTA GTT TCT TGG GGC ATT Gln Ile Glu Gly Arg Arg Thr Leu Val Gly Leu Val Ser Trp Gly Ile 360 365 370 1168 1216 1264 1312 1360 1408 1456 1504 1555 1615 1675 1735 1795 1855 1894 GGA TGT GGT CGT GAG CAT TTA CCA GGC GTA TAT ACC AAT ATA CAA AAA Gly Cys Gly Arg Glu His Leu Pro Gly Val Tyr Thr Asn Ile Gin Lys 375 380 385 390 TTC ATA CCG TGG ATC GAC AAA GTA ATG GGA T AATTTTTATT CCATCGAGCT Phe Ile Pro Trp Ile Asp Lys Val Met Gly 395 400 TACCCAAAGT ATTTATTAAG TGTTAATCGA AAGTTCCAAT AATAAATTAA TTTAAAATTC TAAAGACGGG AATTTGAAAG ACCAAAAAGA CATACTTGTG ATTGTGTAAT TTTTATGAT9 AACTTTACAT CATCTGTGCT TAATTATTAA TTTGTATTAT TCTTGCAAAT ATTTCAAGA( TTACCGAAAA GTTTGCTAAT CGATAATGAT ATTTTAAGAA AAACAACTGC TGCTGATTC] GTCAATGTTA GAATAATTAT GTTTACTAAA TAATATTAAG TTCTGATTAG TAAATAAAT) GCAAAATTAT CTAAATATAT ATAAAAAAAA AAAAAAAAA r

A

INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 400 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Leu Ala Ile Val Pro Ser Asn Gly Ala Phe Ala Asp His Ala Asn 1 5 10 WO 97/40058 PCT/US97/06121 198 Leu Gly Gly Val Asp Gly Leu Ser Gly Leu Ile Leu Val Ala Val Ala 25 Ile Ser Ser Ile Gly Tyr Ala Asp Ala Ala Asn Val Ala Gin Asp Gly 40 His Pro Ser Ser Gin Gin Glu Gin Glu Ile Leu Leu Leu Asn Ala Leu 55 Ala Arg Arg Asn Gly Ala Thr Gly His Gin Phe Asp Val Asp Gin Asp 70 75 Ser Ile Met Asp Met Leu Gly Arg Met Ile Pro Gin Thr Cys Arg Tyr 85 90 Lys Gly Glu Arg Phe Glu Cys Gly Leu Ser Ile Ser Cys Val Leu Gly 100 105 110 Gly Gly Lys Pro Leu Asp Leu Cys Ser Gly Gly Met Ile Trp Ser Cys 115 120 125 Cys Val Asp Arg Asp Ile Arg Pro Glu Pro Gin His Gin Gly Ala Leu 130 135 140 Gin Asn Ala Thr Cys Gly Glu Leu Tyr Thr Arg Ser Asn Arg Ile Val 145 150 155 160 Gly Gly His Ser Thr Gly Phe Gly Ser His Pro Trp Gin Ala Ala Leu 165 170 175 Ile Lys Ser Gly Phe Leu Ser Lys Lys Leu Ser Cys Gly Gly Ala Leu 180 185 190 Val Ser Asp Arg Trp Val Ile Thr Ala Ala His Cys Val Ala Thr Thr 195 200 205 Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp Val Arg Asp 210 215 220 His Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu Arg Lys Glu 225 230 235 240 Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp Val Ala Leu 245 250 255 Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gin His Ile Leu Pro Val 260 265 270 Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met Ala Thr Val 275 280 285 Ala Gly Trp Gly Arg Thr Arg His Gly Gin Ser Thr Val Pro Ala Val 290 295 300 Leu Gin Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg Cys Gin Arg 305 310 315 320 Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp Val Phe Leu 325 330 335 Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp ser Cys Gin Gly Asp Ser 340 345 350 WO 97/40058 WO 9740058PCTIUS97/06121.

199 Gly Gly Pro Leu Ile Met Gin Ile Glu Gly Arg Arg Thr Leu Val Gly 355 360 365 Leu Val Ser Trp Gly Ile Gly Cys Gly Arg Giu His Leu Pro Giy Val 370 375 380 Tyr Thr Asn Ile Gin Lys Phe Ile Pro Trp Ile Asp Lys Val Met Gly 385 390 395 400 INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: i894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3i: TTTTTTTTTT TTTTTTTATA TATATTTAGA TAATTTTGCT ATTTATTTAC TAATCAGAAC

TTAATATTAT

TCTTAAAATA

TAATACAAAT

ACAAGTATGT

TTGGAACTTT

TTAGTAAACA

TCATTATCGA

TAATAATTAA

CTTTTTGG;TC

CGATTAACAC

TCCCATTACT TTGTCGATCC ATGCTCACGA CCACATCCAA AATCTGCATT ATTAGAGGAC ATATCCGGCG CAGAGAAAGA, CCTCTGGCAT CTTTCATTCG AGTGCTCTGC CCGTGCCTCG TTTCATTTGC TTATGAGGTA ATCGAGTTTC ACTAAGGCTA TTCTTTGCGT TCGATTGCGT CCATTCGCCC AATCGCACCT AGTTATAACC CATCGATCGC TCCTGATTTG ATCAAAGCCG TACGATTCTA TTAGACCTCG

TAATTATTCT

TTAGCAAACT

GCACAGATGA

TTTCAAATTC

TTAATP.AATA

ACGGTATGAA

TGCCCCAAGA

CT CCAGAAT C

CATCGTGAAT

GAATCACCTC

TCCGTCCCCA

AGCAGACAGG

CATCATTCCG

ATTCCTCGTG

TCAGGTTCGA

TAACTAAAGC

CCTGCCAAGG

TGTACAATTC

AACATTGACT

TTTCGGTAAC

TGTAAAGTTA

CCGTCTTTAG

CTTTGGGTAA

TTTTTGTATA

AACTAGACCC

ACCTTGGCAT

GGTTTCTCGT

GACATCGACT

TCCGGCGACT

TAAAATATGT

GAAATCGGTT

GTTCAGTCGC

ATTTGGTGTC

GCCACCGCAA

ATGAGACCCG

TCCACAAGTT

GACGCAGCAC

GACGCACGAA

GAATCAGCAG

TCTTGAAATA

ATCATAAAPA

AATTTTAAAT

GCTCGATGGA

TTGGTATATA

ACAAGGGTCC

GAATCACGAC

CGACCCGCAG

TCTTGTAAGA

GTTGCCATTT

TGTTTGAAAA

GGTGAATATG

TCATCGTGGT

GT GGCAACGC

GATAATTTTT

AATCCTGTTG

GCGTTCTGCA

GACCAGATCA

ATTGACAAAC

CAGTTGTTTT

TTTGCAAGAA

TTACACAATC

TAATTTATTA

ATAAAAATTA

CGCCTGGTAA

TTCTACCCTC

CACCCTCTTT

CACGGAACCA

CAGCCGGCAC,

TTCCAGCCAG

TAACAGTTCT

AAGGATGAAC

CGCGGACGTC

AATGTGCAGC

TACT CAAAAA AAT GAC CTC C

GAGCGCCCTG

TTCCGCCGCT

CGCACTCGAA

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 GTGCTGCGGC TCAGGCCGAA TGTCCCTGTC GCACAGGTCA AGAGGTTTTC CGCCGCCCAG WO 97/40058 WO 9740058PCTIUS97/06121 200 CCGTTCGCCT TTGTACCGGC AAGTCTGAGG TATCATTCTT CCTAGCATAT

ATCTTGATCT

CAGCAGGATC

GTCCGCATAT

AACTCCACCA

ATGTGTGATT

AGGCCAGHAT

CACATAAATT

ATTTTCAATA

TACGGGTGTA

TCGGCCGCGG

ACGTCAAATT

TCCTGCTCTT

CCAATCGAAG

AGGTTGGCAT

TCACCGGATT

TGGCAACAAG

TAPATATCCT

GCACAAACAC

TTCGAAGTCA

TCTGAAAGAG

GGTGCCCCGT

GCTGGCTGGA

ATATCGCAAC

GGTCTGCGAA

TCTCTATTTT

AAAATTAGGA

GATTAGACAA

AACTTTCACG

CAAGTTGCAA

ACTGCCCGTC

CGCTCCGTTC

CGGATGTCCG

AGCGACCAGA

CGCTCCGTTT

GCAAGCACAT

AGAGCGACGA

AAAAACTGAA

GTCGGCCGAT

AAATTGAAAT

GCGT

CTGCGAGCTA

TCCTGCGCAA.

ATCAAACCAG

GACGGGACGA

TTGCGTCGGC

GATTGCGTTT

TATTTATTAT

GGTAAAGTGG

TAAACCGTCA

CCATAATTGA

AGGCATTCAG,

CGTTCGCCGC

AAAGAC CAT C

TCGCTAACAT

CGATGCACTG

AACACTTGTA

TGAGCACGAA

ATAGGTCACG

AAAACATCGT

1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1894 INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 1200 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1200 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: ATG TTA GCG ATC GTC CCG TCA AAC GGA GCG TTC GCA GAC CAT GCC AAC Met Leu Ala Ile Val Pro Ser Asn Gly Ala Phe Ala Asp His Ala Asn 1 5 10 CTT GGT GGA GTT GAT GGT CTT TCT GGT TTG ATT CTG GTC, GCT GTT GCG Leu Gly Giy Val Asp Giy Leu Ser Giy Leu Ile Leu Val Ala Val Ala 25 ATA TCT TCG ATT GGA TAT GCG GAC GCG GCG AAC GTT GCG CAG GAC GGA Ile Ser Ser Ile Gly Tyr Ala Asp Ala Ala Asn Val Ala Gln Asp Gly 40 CAT CCG TCC AGC CAG CAA GAG CAG GAG ATC CTG CTG CTG PAT GCC TTA His Pro Ser Ser Gin Gin Giu Gin Giu Ile Leu Leu Leu Asn Ala Leu 55 GCT CGC AGG AAC GGA GCG ACG GGG CAC CPA TTT GAC GTA GAT CAA GAT Ala Arg Arg Asn Gly Ala Thr Gly His Gin Phe Asp Val Asp Gin Asp 70 75 TCA ATT ATG GAT ATG CTA GGA AGA ATG ATA CCT CAG ACT TGC CGG TAC Ser Ile Met Asp Met Leu Gly Arg Met Ile Pro Gin Thr Cys Arg Tyr 85 90 48 96 144 192 240 288 WO 97/40058 PCT/US97/06121 201 AAA GGC GAA CGG TTC GAG TGC GGT TTG TCA ATT TCG TGC GTC CTG GGC Lys Gly Glu Arg Phe Glu Cys Gly Leu Ser Ile Ser Cys Val Leu Gly 100 105 110 GGC GGA AAA CCT CTT GAC CTG TGC AGC GGC GGA ATG ATC TGG TCG TGC Gly Gly Lys Pro Leu Asp Leu Cys Ser Gly Gly Met Ile Trp Ser Cys 115 120 125 TGC GTC GAC AGG GAC ATT CGG CCT GAG CCG CAG CAC CAG GGC GCT CTG Cys Val Asp Arg Asp Ile Arg Pro Glu Pro Gin His Gin Gly Ala Leu 130 135 140

CAG

Gin 145 AAC GCA ACT TGT GGA GAA TTG TAC ACG AGG TCT AAT AGA ATC GTA Asn Ala Thr Cys Gly Glu Leu Tyr Thr Arg Ser Asn Arg Ile Val 150 155 160 GGA GGT CAT TCA ACA GGA TTC GGG TCT CAT CCT TGG CAG GCG GCT TTG Gly Gly His Ser Thr Gly Phe Gly Ser His Pro Trp Gin Ala Ala Leu 165 170 175 ATC AAA TCA Ile Lys Ser GGA TTT TTG AGT AAA AAA TTA TCT TGC GGT GGC GCT TTA Gly Phe Leu Ser Lys Lys Leu Ser Cys Gly Gly Ala Leu 180 185 190 GTT AGC GAT CGA TGG GTT ATA ACT GCT GCA CAT TGC GTT GCC ACG ACA Val Ser Asp Arg Trp Val Ile Thr Ala Ala His Cys Val Ala Thr Thr 195 200 205 CCA AAT TCG AAC CTG AAG GTG CGA TTG GGC GAA TGG GAC GTC CGC GAC Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp Val Arg Asp 210 215 220

CAC

His 225 GAT GAG CGA CTG AAC CAC GAG GAA TAC GCA ATC GAA CGC AAA GAA Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu Arg Lys Glu 230 235 240 GTT CAT CCT TCA TAT TCA CCA ACC GAT TTC CGG AAT GAT GTA GCC TTA Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp Val Ala Leu 245 250 255 GTG AAA CTC GAT AGA ACT GTT ATT TTC AAA CAA CAT ATT TTA CCT GTC Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gin His Ile Leu Pro Val 260 265 270 TGC TTA CCT CAT AAG CAA ATG AAA CTG GCT GGA AAA ATG GCA ACA GTC Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met Ala Thr Val 275 280 285 GCC GGA TGG GGA CGG ACG AGG CAC GGG CAG AGC ACT GTG CCG GCT GTC Ala Gly Trp Gly Arg Thr Arg His Gly Gin Ser Thr Val Pro Ala Val 290 295 300 720 768 816 864 912 960 1008 1056

TTA

Leu 305 CAA GAA GTC GAT GTC GAG GTG ATT CCG AAT GAA AGA TGC CAG AGG Gin Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg Cys Gin Arg 310 315 320 TGG TTC CGT GCT GCG GGT CGA CGA GAA ACC ATT CAC GAT GTC TTT CTC Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp Val Phe Leu 325 330 335 TGC GCC GGA TAT AAA GAG GGT GGT CGT GAT TCA TGC CAA GGT GAT TCT WO 97/40058 PCT/US97/06121 202 Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp Ser Cys Gin Gly Asp Ser 340 -345 350 GGA GGT CCT CTA ATA ATG CAG ATT GAG GGT AGA AGG ACC CTT GTG GGT Gly Gly Pro Leu Ile Met Gin Ile Glu Gly Arg Arg Thr Leu Val Gly 355 360 365 CTA GTT TCT TGG GGC ATT GGA TGT GGT CGT GAG CAT TTA CCA GGC GTA Leu Val Ser Trp Gly Ile Gly Cys Gly Arg Glu His Leu Pro Gly Val 370 375 380 TAT ACC AAT ATA CAA AAA TTC ATA CCG TGG ATC GAC AAA GTA ATG GGA Tyr Thr Asn Ile Gin Lys Phe Ile Pro Trp Ile Asp Lys Val Met Gly 385 390 395 400 INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 400 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein 1104 1152 1200 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: Met Leu Ala Ile Val Pro Ser Asn Gly Ala Phe Ala Asp His Ala Asn 1 Leu Ile His Ala Ser Lys Gly Cys Gin 145 Gly Gly Gly Ser Ser Pro Ser Arg Arg Ile Met Gly Glu Gly Lys 115 Val Asp 130 Asn Ala Gly His Val Ile Ser Asn Asp Arg 100 Pro Arg Thr Ser Asp Gly Gin Gly Met 85 Phe Leu Asp Cys Thr 165 Gly Tyr Gin Ala 70 Leu Glu Asp Ile Gly 150 Gly Leu Ala Glu 55 Thr Gly Cys Leu Arg 135 Glu Phe Ser Gly Leu 25 Asp Ala Ala 40 Gin Glu Ile Gly His Gin Arg Met Ile 90 Gly Leu Ser 105 Cys Ser Gly 120 Pro Glu Pro Leu Tyr Thr Gly Ser His 170 Ile Asn Leu Phe 75 Pro Ile Gly Gin Arg 155 Pro Leu Val Val Ala Leu Leu Asp Val Gin Thr Ser Cys Met Ile 125 His Gin 140 Ser Asn Trp Gin Ala Gin Asn Asp Cys Val 110 Trp Gly Arg Ala Val Asp Ala Gin Arg Leu Ser Ala Ile Ala Ala Gly Leu Asp Tyr Gly Cys Leu Val 160 Leu 175 Ile Lys Ser Gly 180 Phe Leu Ser Lys Lys Leu Ser Cys Gly Gly Ala Leu 190 WO 97/40058 PCT/US97/06121 203 Val Ser Asp 195 Arg Trp Val Ile Thr Ala Ala His Cys Pro Asn 210 Ser Asn Leu Lys 200 Arg Val 205 Asp Ala Thr Thr Val Arg Asp Val 215 His Leu Gly Glu Trp 220 Ile His 225 Val Asp Glu Arg Leu Asn 230 Ser Glu Glu Tyr Ala 235 Arg Glu Arg Lys Glu 240 His Pro Ser Pro Thr Asp Phe 250 Lys Asn Asp Val Ala Leu 255 Val Lys Leu Cys Leu Pro 275 Ala Gly Trp Asp 260 His Thr Val Ile Gin His Ile Lys Gin Met Lys 280 His Ala Gly Lys Met 285 Val Leu Pro Val 270 Ala Thr Val Pro Ala Val Gly Arg Thr Arg 295 Gly Gin Ser 290 Thr 300 Glu Leu 305 Trp Gin Glu Val Asp Val 310 Gly Glu Val Ile Pro Arg Cys Gin Phe Arg Ala Arg Arg Glu Thr 330 Asp His Asp Val Phe Leu 335 Cys Ala Gly Gly Gly Pro 355 Leu Val Ser 370 Glu Gly Gly Ser Cys Gin Leu Ile Met Gin Gly Arg Arg Gly Asp Ser 350 Leu Val Gly Pro Gly Val Trp Gly Ile Gly 375 Phe Gly Arg Glu Tyr 385 Thr Asn Ile Gin Ile Pro Trp Ile 395 Lys Val Met Gly 400 INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 1200 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: TCCCATTACT TTGTCGATCC ACGGTATGAA TTTTTGTATA TTGGTATATA ATGCTCACGA CCACATCCAA TGCCCCAAGA AACTAGACCC ACAAGGGTCC AATCTGCATT ATTAGAGGAC CTCCAGAATC ACCTTGGCAT GAATCACGAC ATATCCGGCG CAGAGAAAGA CATCGTGAAT GGTTTCTCGT CGACCCGCAG CCTCTGGCAT CTTTCATTCG GAATCACCTC GACATCGACT TCTTGTAAGA AGTGCTCTGC CCGTGCCTCG TCCGTCCCCA TCCGGCGACT GTTGCCATTT

CGCCTGGTAA

TTCTACCCTC

CACCCTCTTT

CACGGAACCA

CAGCCGGCAC

TTCCAGCCAG

120 180 240 300 360 WO 97/40058 WO 9740058PCTIUS97/06121 204 TTTCATTTGC TTATGAGGTA ATCGAGTTTC ACTAAGGCTA TTCTTTGCGT TCGATTGCGT CCATTCGCCC AATCGCACCT AGTTATAACC C-ATCGATCGC TCCTGATTTG ATCAAAGCCG TACGATTCTA TTAGACCTCG GTGCTGCGGC TCAGGCCGAA GCACAGGTCA AGAGGTTTTC CCGTTCGCCT TTGTACCGGC ATCTTGATCT ACGTCAAATT CAGCAGGATC TCCTGCTCTT GTCCGCATAT CCAATCGAAG AACTCCACCA AGGTTGGCAT

AGCAGACAGG

CATCATTCCG

ATTCCTCGTG

TCAGGTTCGA

TAACTAAAGC

CCTGCCAAGG

TGTACAATTC

TGTCCCTGTC

CGCCGCCCAG

AAGTCTGAGG

GGTGCCCCGT

GCTGGCTGGA

ATATCGCAAC

GGTCTGCGAA

TAAAATATGT

GAAATCGGTT

GTTCAGT CGC

ATTTGGTGTC

GCCACCGCAA

ATGAGACCCG

TCCACAAGTT

GACGCAGCAC

GACGCACGAA

TATCATTCTT

CGCTCCGTTC

CGGATGTCCG

AGCGACCAGA

TGTTTGAAAA

GGTGAATATG

TCATCGTGGT

GTGGCAACGC

GATAATTTTT

AATCCTGTTG

GCGTTCTGCA

GACCAGATCA

ATTGACAAAC

CCTAGCATAT

CTGCGAGCTA

TCCTGCGCAA

ATCAAACCAG

TAACAGTTCT

AAGGATGAAc

CGCGGACGTC

AATGTGCAGC

TACTCAAAAA

AATGACCTCC

GAGCGCCCTG

TTCCGCCGCT

CGCACTCGAA

CCATAATTGA

AGGCATTCAG

CGTTCGCCGC

AAAGACCATC

420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 CGCTCCGTTT GACGGGACGA TCGCTAACAT INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 726 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA (ix) FEATURE: NPAME/KEY: CDS LOCATION: 1. .726 (xi) SEQUENCE DESCRIPTION: SEQ ID ATC GTA GGA GGT CAT TCA ACA GGA TTC GGG TCT CAT CCT TGG CAG GCG Ile Val Gly Gly His Ser Thr Gly Phe Gly Ser His Pro Trp Gln Ala 1 5 10 GCT TTG ATC AAA TCA GGA TTT TTG AGT AAA AAA TTA TCT TGC GGT GGC Ala Leu Ile Lys Ser Gly Phe Leu Ser Lys Lys Leu Ser Cys Gly Gly 25 GCT TTA GTT AGC GAT CGA TGG GTT ATA ACT GCT GCA CAT TGC GTT GCC Ala Leu Val Ser Asp Arg Trp Val Ile Thr Al1a Ala His Cys Val Ala 40 ACG ACA CCA AAT TCG AAC CTG AAG GTG CGA TTG GGC GAA TGG GAC GTC Thr Thr Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp Val 55 CGC GAC CAC GAT GAG CGA CTG AAC CAC GAG GAA TAC GCA ATC GAA CGC WO 97/40058 PCT/US97/061Z1.

205 Arg Asp His Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu Arg 70- 75 AAA GAA GTT CAT CCT TCA TAT TCA CCA ACC GAT TTC CGG AAT GAT GTA 288 Lys Glu Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp Val 85 90 GCC TTA GTG AAA CTC GAT AGA ACT GTT ATT TTC AAA CAA CAT ATT TTA 336 Ala Leu Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gin His Ile Leu 100 105 110 CCT GTC TGC TTA CCT CAT AAG CAA ATG AAA CTG GCT GGA AAA ATG GCA 384 Pro Val Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met Ala 115 120 125 ACA GTC GCC GGA TGG GGA CGG ACG AGG CAC GGG CAG AGC ACT GTG CCG 432 Thr Val Ala Gly Trp Gly Arg Thr Arg His Gly Gin Ser Thr Val Pro 130 135 140 GCT GTC TTA CAA GAA GTC GAT GTC GAG GTG ATT CCG AAT GAA AGA TGC 480 Ala Val Leu Gin Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg Cys 145 150 155 160 CAG AGG TGG TTC CGT GCT GCG GGT CGA CGA GAA ACC ATT CAC GAT GTC 528 Gin Arg Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp Val 165 170 175 TTT CTC TGC GCC GGA TAT AAA GAG GGT GGT CGT GAT TCA TGC CAA GGT 576 Phe Leu Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp Ser Cys Gin Gly 180 185 190 GAT TCT GGA GGT CCT CTA ATA ATG CAG ATT GAG GGT AGA AGG ACC CTT 624 Asp Ser Gly Gly Pro Leu Ile Met Gin Ile Glu Gly Arg Arg Thr Leu 195 200 205 GTG GGT CTA GTT TCT TGG GGC ATT GGA TGT GGT CGT GAG CAT TTA CCA 672 Val Gly Leu Val Ser Trp Gly Ile Gly Cys Gly Arg Glu His Leu Pro 210 215 220 GGC GTA TAT ACC AAT ATA CAA AAA TTC ATA CCG TGG ATC GAC AAA GTA 720 Gly Val Tyr Thr Asn Ile Gin Lys Phe Ile Pro Trp Ile Asp Lys Val 225 230 235 240 ATG GGA 726 Met Gly INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 242 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: Ile Val Gly Gly His Ser Thr Gly Phe Gly Ser His Pro Trp Gin Ala 1 5 10 WO 97/40058 PCT/US97/06121.

206 Ala Leu Ile Lys Ser Gly Phe Leu Ser Lys Lys Leu Ser Cys Gly Gly -25 Ala Leu Val Ser Asp Arg Trp Val Ile Thr Ala Ala His Cys Val Ala 40 Thr Thr Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp Val 55 Arg Asp His Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu Arg 70 75 Lys Glu Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp Val 85 90 Ala Leu Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gin His Ile Leu 100 105 110 Pro Val Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met Ala 115 120 125 Thr Val Ala Gly Trp Gly Arg Thr Arg His Gly Gin Ser Thr Val Pro 130 135 140 Ala Val Leu Gin Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg Cys 145 150 155 160 Gin Arg Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp Val 165 170 175 Phe Leu Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp Ser Cys Gin Gly 180 185 190 Asp Ser Gly Gly Pro Leu Ile Met Gin Ile Glu Gly Arg Arg Thr Leu 195 200 205 Val Gly Leu Val Ser Trp Gly Ile Gly Cys Gly Arg Glu His Leu Pro 210 215 220 Gly Val Tyr Thr Asn Ile Gin Lys Phe Ile Pro Trp Ile Asp Lys Val 225 230 235 240 Met Gly INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 428 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..428 OTHER INFORMATION: /note= "At pos. bp 50, change A to N. At pos. aa 17, substitute Xaa." WO 97/40058 PCT/US97/06121.

207 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: GTA ACC AAA TAT GAA AAC GAT ATC GAA GGT TTA AAA GTT AGG GTT GGA Val Thr Lys Tyr Glu Asn Asp Ile Glu Gly Leu Lys Val Arg Val Gly 1 5 10

ANC

Xaa

AAA

Lys

GCT

Ala

GTT

Val

CTC

Leu

TCA

Ser

TGC

Cys

GAC

Asp

AAT

Asn

AAA

Lys

TTA

Leu

CCT

Pro

GTT

Val

ACG

Thr

AAC

Asn

AGC

Ser 130 GAG CAT AAC AAA GGT GGG CGT TTA TAC GAC ATT AAA GAA ATT Glu His Asn Lys Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile 25 CAT CCA AGA TAT AAC GAT CGA ACC AGA TAC GAT TTT GAT GTC His Pro Arg Tyr Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val 35 40 TTA CGC ATT GCA AAG CCA ATT GCA TAC ACT GCT TGC ACT GTT Leu Arg Ile Ala Lys Pro Ile Ala Tyr Thr Ala Cys Thr Val 55 GTA GCA TTG GCA GAA ACT GGA AAA GAA GTT CCA GAA GGC GCA Val Ala Leu Ala Glu Thr Gly Lys Glu Val Pro Glu Gly Ala 70 75 AGT GTC ACA GGA TGG GGG GCT ACT ATG GTG GGC GGC CCA GCA Ser Val Thr Gly Trp Gly Ala Thr Met Val Gly Gly Pro Ala 90 CAT CTA AAA GGT GTT AAG GTT CCA ATC GTG TCA AAT GAA GAA His Leu Lys Gly Val Lys Val Pro Ile Val Ser Asn Glu Glu 100 105 110 AAA AAT TAT ACC ATT CCT GGA GGT CTG GAT GAC AAA ATT TCA Lys Asn Tyr Thr Ile Pro Gly Gly Leu Asp Asp Lys Ile Ser 115 120 125 ATG TTT TGC GCT GGT TTC CCT GAA GGC GGA AAG GAC TC Met Phe Cys Ala Gly Phe Pro Glu Gly Gly Lys Asp 135 140 96 144 192 240 288 336 384 428 INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 142 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: Val Thr Lys Tyr Glu Asn Asp Ile Glu Gly Leu Lys Val Arg Val Gly 1 5 10 Xaa Asn Glu His Asn Lys Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile 25 Lys Lys His Pro Arg Tyr Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val 40 WO 97/40058 PCT/US97/06121.

208 Ala Leu Val Pro Leu Arg Ile Val Ala Leu Ala Lys -55 Ala Glu Gly Trp Pro Ile Ala Tyr Thr Val Ala Cys Thr Val Thr Gly Lys Gly Ala Thr 90 Pro Glu Gly Leu Val Ser Val Thr Met Val Gly Gly Pro Ala Ser Thr His Cys Asn Lys 115 Asp Ser Met 130 Lys Gly Val Lys Pro Ile Val Ser Asn Glu Glu 110 Lys Ile Ser Tyr Thr Ile Gly Leu Asp Asp 125 Lys Asp Phe Cys Ala Gly 135 Pro Glu Gly Gly 140 INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 428 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: GAGTCCTTTC CGCCTTCAGG GAAACCAGCG CAAAACATGC TGTCTGAAAT AGACCTCCAG GAATGGTATA ATTTTTGTTG CATTCTTCAT TTGACACGAT ACACCTTTTA GATGCGTTGA TGCTGGGCCG CCCACCATAG TAGCCCCCCA CTAACGAGTG CGCCTTCTGG AACTTCTTTT CCAGTTTCTG CCAATGCTAC GTGCAAGCAG TGTATGCAAT TGGCTTTGCA ATGCGTAATA AAGCGACATC CTGGTTCGAT CGTTATATCT TGGATGTTTT TTAATTTCTT TAATGTCGTA CCTTTGTTAT GCTCATTGNT TCCAACCCTA ACTTTTAAAC CTTCGATATC

TTGGTTAC

TTTGTCATCC

TGGAACCTTA

TCCTGTGACA

AGGAACAACA

AAAATCGTAT

TAAACGCCCA

GTTTTCATAT

INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 841 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..728 WO 97/40058 PCT/US97/06121 209 OTHER INFORMATION: /note= "At pos. bp 186, change A to N; at pos. bp 627, change A to N. At pos. aa 62 and 209, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID A GAT CAT CGA ATA GTA GGA GGT GAA GAT GTA GAT ATT TCA ACT TGT Asp His Arg Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys 1 5 10 GGA TGG CAA GTT TCG TTT CAC AAT AGG AAA GGA CAT TTT TGT GGA GGG Gly Trp Gin Val Ser Phe His Asn Arg Lys Gly His Phe Cys Gly Gly 20 25 TCC ATC ATT GGC AAA GAA TGG ATT CTA ACT GCT GCG CAT TGT GTA ACC Ser Ile Ile Gly Lys Glu Trp Ile Leu Thr Ala Ala His Cys Val Thr 40 AAA TAT GAA AAC GAT ATC GAA GGT TTA AAA GTT AGG GTT GGA ANC AAT Lys Tyr Glu Asn Asp Ile Glu Gly Leu Lys Val Arg Val Gly Xaa Asn 55 GAG CAT AAC AAA GGT GGG CGT TTA TAC GAC ATT AAA GAA ATT AAA AAA Glu His Asn Lys Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile Lys Lys 70 CAT CCA AGA TAT AAC GAT CGA ACC AGA TAC GAT TTT GAT GTC GCT TTA His Pro Arg Tyr Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val Ala Leu 85 90

TTA

Leu

GTA

Val

AGT

ser

CAT

His

AAA

Lys 160

ATG

Met

CAT

Asp

GTG

val CGC ATT GCA AAG CCA ATT GCA TAC ACT GCT TGC ACT GTT GTT CCT Arg Ile Ala Lys Pro Ile Ala Tyr Thr Ala Cys Thr Val Val Pro 100 105 110 GCA TTG GCA GAA ACT GGA AAA GAA GTT CCA GAA GGC GCA CTC GTT Ala Leu Ala Glu Thr Gly Lys Glu Val Pro Glu Gly Ala Leu Val 115 120 125 GTC ACA GGA TGG GGG GCT ACT ATG GTG GGC GGC CCA GCA TCA ACG Val Thr Gly Trp Gly Ala Thr Met Val Gly Gly Pro Ala Ser Thr 130 135 140 CTA AAA GGT GTT AAG GTT CCA ATC GTG TCA AAT GAA GAA TGC AAC Leu Lys Gly Val Lys Val Pro Ile Val Ser Asn Glu Glu Cys Asn 145 150 155 AAT TAT ACC ATT CCT GGA GGT CTG GAT GAC AAA ATT TCA GAC AGC Asn Tyr Thr Ile Pro Gly Gly Leu Asp Asp Lys Ile Ser Asp Ser 165 170 175 TTT TGC GCT GGT TTC CCT GAA GGC GGA AAG GAC TCG TGT CAA GGA Phe Cys Ala Gly Phe Pro Glu Gly Gly Lys Asp Ser Cys Gin Gly 180 185 190 AGC GGT GGG CCT GTA GTG GAT GAA AAT AGG GTT CAG GTC GGA ATT Ser Gly Gly Pro Val Val Asp Glu Asn Arg Val Gin Val Gly Ile 195 200 205 TNT TGG GGC GAA GGC TGT GCT TTA GCA GGA AAA CCA GGC GTT TAT Xaa Trp Gly Glu Gly Cys Ala Leu Ala Gly Lys Pro Gly val Tyr 210 215 220 334 382 430 478 526 574 622 670 WO 97/40058 PCT/US97/06121 210 GCA AAA GTT TCA CAT CCT GAC GTA AAA AGG TTT ATT GAA ACC GTA GCA 718 Ala Lys Val Ser His Pro-Asp Val Lys Arg Phe Ile Glu Thr Val Ala 225 230 235 GGA ATC AAA T AAAATTTGTT AGAAAAAATG TAGACAAGTT GTATAAACTA 768 Gly Ile Lys 240 TCAATGAAAT TGTTTTATTT TTGGAAATAA AATATAATTT ATGAAAAAAA AAAAAAAAAA 828 AAAAAAAAAA AAA 841 INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 242 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: Asp His Arg Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly 1 5 10 Trp Gin Val Ser Phe His Asn Arg Lys Gly His Phe Cys Gly Gly Ser 25 Ile Ile Gly Lys Glu Trp Ile Leu Thr Ala Ala His Cys Val Thr Lys 40 Tyr Glu Asn Asp Ile Glu Gly Leu Lys Val Arg Val Gly Xaa Asn Glu 55 His Asn Lys Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile Lys Lys His 65 70 75 Pro Arg Tyr Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val Ala Leu Leu 90 Arg Ile Ala Lys Pro Ile Ala Tyr Thr Ala Cys Thr Val Val Pro Val 100 105 110 Ala Leu Ala Glu Thr Gly Lys Glu Val Pro Glu Gly Ala Leu Val Ser 115 120 125 Val Thr Gly Trp Gly Ala Thr Met Val Gly Gly Pro Ala Ser Thr His 130 135 140 Leu Lys Gly Val Lys Val Pro Ile Val Ser Asn Glu Glu Cys Asn Lys 145 150 155 160 Asn Tyr Thr Ile Pro Gly Gly Leu Asp Asp Lys Ile Ser Asp Ser Met 165 170 175 Phe Cys Ala Gly Phe Pro Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp WO 97/40058 PCTIUS97/06121.

211 Ser Gly Gly Pro Val Val Asp Glu Asn Arg Val Gin Val Gly Ile Val 195 200 205 Xaa Trp Gly Glu Gly Cys Ala Leu Ala Gly Lys Pro Gly Val Tyr Ala 210 215 220 Lys Val Ser His Pro Asp Val Lys Arg Phe Ile Giu Thr Val Ala Gly 225 230 235 240 Ile Lys INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 841 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: TTTTTTTTTT TTTTTTTTTT TTTTTTTTTT CANAAATTAT ATTTTATTTC CAAAAATAAA ACAATTTCAT TGANAGTTTA TCCTGCTACG GTTTCAATAA TGGTTTTCCT GCTAAAGCAC TTCATCCACT ACAGGCCCAC ACCAGCGCAA AACATGCTGT TTTGTTGCAT TCTTCATTTG TGGGCCGCCC ACCATAGTAG TTCTTTTCCA GTTTCTGCCA CTTTGCAATG CGTAATAAAG ATGTTTTTTA ATTTCTTTAA AACCCTAACT TTTAAACCTT TAGAATCCAT TCTTTGCCAA

TACAACTTGT

ACCTTTTTAC

AGCCTTCGCC

CGCTATCTCC

CTGAAATTTT

ACACGATTGG

CCCCCCATCC

ATGCTACAGG

CGACATCAAA

TGTCGTATAA

CGATATCGTT

TGATGGACCC

CTACATTTTT

GTCAGGATGT

CCAANACACA

TTGACACGAG

GTCATCCAGA

AACCTTAACA

TGTGACACTA

AACAACAGTG

ATCGTATCTG

ACGCCCACCT

TTCATATTTG

TCTAACAAAT

GAAACTTTTG

ATTCCGACCT

TCCTTTCCGC

CCTCCAGGAA

CCTTTTAGAT

ACGAGTGCGC

CAAGCAGTGT

GTTCGATCGT

TTGTTATGCT

GTTACACAAT

TTTATTTGAT

CATAAACGCC

GAACCCTATT

CTTCAGGGAA

TGGTATAATT

GCGTTGATGC

CTTCTGGAAC

ATGCAATTGG

TATATCTTGG

CATTGNTTCC

GCGCAGCAGT

TATTGTGAAA

120 180 240 300 360 420 480 540 600 660 720 780 840 841 TCCACAAAAA TGTCCTTTCC CGAAACTTGC CATCCACAAG TTGAAATATC TACATCTTCA CCTCCTACTA TTCGATGATC

T

INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 717 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 97/40058 PCTIUS97/06121 212 (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..717 OTHER INFORMATION: /note= "At pos. bp 176, change A to N; at pos. bp 617, change A to N. At pos. aa. 59 and 206, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: ATA GTA GGA GGT GAA GAT GTA GAT ATT TCA ACT TGT GGA TGG CAA GTT 48 Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly Trp Gin Val 1 5 10 TCG TTT CAC AAT AGG AAA GGA CAT TTT TGT GGA GGG TCC ATC ATT GGC 96 Ser Phe His Asn Arg Lys Gly His Phe Cys Gly Gly Ser Ile Ile Gly 25 AAA GAA TGG ATT CTA ACT GCT GCG CAT TGT GTA ACC AAA TAT GAA AAC 144 Lys Glu Trp Ile Leu Thr Ala Ala His Cys Val Thr Lys Tyr Glu Asn 40 GAT ATC GAA GGT TTA AAA GTT AGG GTT GGA ANC AAT GAG CAT AAC AAA 192 Asp Ile Glu Gly Leu Lys Val Arg Val Gly Xaa Asn Glu His Asn Lys 50 55 GGT GGG CGT TTA TAC GAC ATT AAA GAA ATT AAA AAA CAT CCA AGA TAT 240 Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile Lys Lys His Pro Arg Tyr 70 75 AAC GAT CGA ACC AGA TAC GAT TTT GAT GTC GCT TTA TTA CGC ATT GCA 288 Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val Ala Leu Leu Arg Ile Ala 90 AAG CCA ATT GCA TAC ACT GCT TGC ACT GTT GTT CCT GTA GCA TTG GCA 336 Lys Pro Ile Ala Tyr Thr Ala Cys Thr Val Val Pro Val Ala Leu Ala 100 105 110 GAA ACT GGA AAA GAA GTT CCA GAA GGC GCA CTC GTT AGT GTC ACA GGA 384 Glu Thr Gly Lys Glu Val Pro Glu Gly Ala Leu Val Ser Val Thr Gly 115 120 125 TGG GGG GCT ACT ATG GTG GGC GGC CCA GCA TCA ACG CAT CTA AAA GGT 432 Trp Gly Ala Thr Met Val Gly Gly Pro Ala Ser Thr His Leu Lys Gly 130 135 140 GTT AAG GTT CCA ATC GTG TCA AAT GAA GAA TGC AAC AAA AAT TAT ACC 480 Val Lys Val Pro Ile Val Ser Asn Glu Glu Cys Asn Lys Asn Tyr Thr 145 150 155 160 ATT CCT GGA GGT CTG GAT GAC AAA ATT TCA GAC AGC ATG TTT TGC GCT 528 Ile Pro Gly Gly Leu Asp Asp Lys Ile Ser Asp Ser Met Phe Cys Ala 165 170 175 GGT TTC CCT GAA GGC GGA AAG GAC TCG TGT CAA GGA GAT AGC GGT GGG 576 Gly Phe Pro Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly 180 185 190 CCT GTA GTG GAT GAA AAT AGG GTT CAG GTC GGA ATT GTG TNT TGG GGC 624 Pro Val Val Asp Glu Asn Arg Val Gin Val Gly Ile Val Xaa Trp Gly 195 200 205 WO 97/40058 PCT/US97/06121.

213 GAA GGC TGT GCT TTA GCA GGA AAA CCA GGC GTT TAT GCA AAA GTT TCA 672 Glu Gly Cys Ala Leu Ala-Gly Lys Pro Gly Val Tyr Ala Lys Val Ser 210 215 220 CAT CCT GAC GTA AAA AGG TTT ATT GAA ACC GTA GCA GGA ATC AAA 717 His Pro Asp Val Lys Arg Phe Ile Glu Thr Val Ala Gly Ile Lys 225 230 235 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 239 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly Trp Gin Val 1 5 10 Ser Phe His Asn Arg Lys Gly His Phe Cys Gly Gly Ser Ile Ile Gly 25 Lys Glu Trp Ile Leu Thr Ala Ala His Cys Val Thr Lys Tyr Glu Asn 40 Asp Ile Glu Gly Leu Lys Val Arg Val Gly Xaa Asn Glu His Asn Lys 55 Gly Gly Arg Leu Tyr Asp Ile Lys Glu Ile Lys Lys His Pro Arg Tyr 70 75 Asn Asp Arg Thr Arg Tyr Asp Phe Asp Val Ala Leu Leu Arg Ile Ala 85 90 Lys Pro Ile Ala Tyr Thr Ala Cys Thr Val Val Pro Val Ala Leu Ala 100 105 110 Glu Thr Gly Lys Glu Val Pro Glu Gly Ala Leu Val Ser Val Thr Gly 115 120 125 Trp Gly Ala Thr Met Val Gly Gly Pro Ala Ser Thr His Leu Lys Gly 130 135 140 Val Lys Val Pro Ile Val Ser Asn Glu Glu Cys Asn Lys Asn Tyr Thr 145 150 155 160 Ile Pro Gly Gly Leu Asp Asp Lys Ile Ser Asp Ser Met Phe Cys Ala 165 170 175 Gly Phe Pro Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly 180 185 190 Pro Val Val Asp Glu Asn Arg Val Gin Val Gly Ile Val Xaa Trp Gly 195 200 205 Glu Gly Cys Ala Leu Ala Giv LVs Pro Glv Val Tvr Ala Lvs Val Ser 210s-V215S22 210 215 220 .J WO 97/40058 PCTIUS97/06121.

214 His Pro Asp Val Lys Arg Phe le Glu Thr Val Ala Gly Ile Lys 225 230- 235 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 717 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID TTTGATTCCT GCTACGGTTT CAATAAACCT TTTTACGTCA GGATGTGAAA

AACGCCTGGT

CCTATTTTCA

AGGGAAACCA

ATAATTTTTG

TGATGCTGGG

TGGAACTTCT

AATTGGCTTT

TCTTGGATGT

GNTTCCAACC

AGCAGTTAGA

GTGAAACGAA

TTTCCTGCTA

TCCACTACAG

GCGCAAAACA

TTGCATTCTT

CCGCCCACCA

TTTCCAGTTT

GCAATGCGTA

TTTTTAATTT

CTAACTTTTA

ATCCATTCTT

ACTTGCCATC

AAGCACAGCC

GCCCACCGCT

TGCTGTCTGA

CATTTGACAC

TAGTAGCCCC

CTGCCAATGC

ATAAAGCGAC

CTTTAATGTC

AACCTTCGAT

TGCCAATGAT

TTCGCCCCAA

ATCTCCTTGA

AATTTTGTCA

GATTGGAACC

CCATCCTGTG

TACAGGAACA

ATCAAAATCG

GTATAAACGC

ATCGTTTTCA

GGACCCTCCA

NACACAATTC

CACGAGTCCT

TCCAGACCTC

TTAACACCTT

ACACTAACGA

ACAGTGCAAG

TATCTGGTTC

CCACCTTTGT

TATTTGGTTA

CAAAAATGTC

CTTTTGCATA

CGACCTGAAC

TTCCGCCTTC

CAGGAATGGT

TTAGATGCGT

GTGCGCCTTC

CAGTGTATGC

GATCGTTATA

TATGCTCATT

CACAATGCGC

CTTTCCTATT

CACAAGTTGA AATATCTACA TCTTCACCTC CTACTAT INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: GGACAAACTG TTCATTGCAG INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid WO 97/40058 PCT/US97/06121 215 STRANDEDNESS: single TOPOLOGY:-linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: CCCTCATTTG TCGTAACTCC INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: GGCTAGGTTA GTGGATTCTG G 21 INFORMATION FOR SEQ ID NO:49: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: GCAAATCAGT TCCAGAATCC ACTAACC 27 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID CAGGGCGCTC TGCAGAACGC AAC 23 INFORMATION FOR SEQ ID NO:51: WO 97/40058 PCT/US97/06121.

216 SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: ATTCCTCGTG GTTCAGTCGC TC 22 INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: GGCAAGTTTC GTTTCACAAT AGG 23 INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: TCCAACCCTA ACTTTTAAAC CTTC 24 INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: CACAGGATCC AATAATTTGT GGTCAAAATG C 31 INFORMATION FOR SEQ ID WO 97/40058 PCT/US97/06121 217 SEQUENCE CHARACTERISTICS: LENGTH: 37 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID AAAAAGAAAG CTTCTTTAAT TTTCTGACAT TGTCGTG 37 INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: GCGGGATCCT ATTGTGGGTG GTGAAGCAGT G 31 INFORMATION FOR SEQ ID NO:57: SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: GACGGTACCA TGTATAAAAT AATATTAAAC TCCGG INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: LENGTH: 34 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: CCGGGATCCT ATGTTAGCGA TCGTCCCGTC AAAC 34 INFORMATION FOR SEQ ID NO:59: WO 97/40058 PCT/US97/06121.

218 SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: CCGGAATTCT TATCCCATTA CTTTGTCGAT CC 32 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID GCGGGATCCA ATAGTAGGAG GTGAAGATGT AG 32 INFORMATION FOR SEQ ID NO:61: SEQUENCE CHARACTERISTICS: LENGTH: 36 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: CCGGAATTCT TCTAACAAAT TTTATTTGAT TCCTGC 36 INFORMATION FOR SEQ ID NO:62: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: GGATCCAATC GTTGGAGGTG AAGATG 26 INFORMATION FOR SEQ ID NO:63: WO 97/40058 PCT/US97/06121 219 SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: GAATTCGAAA TCCACTTAAA CATTAGC 27 INFORMATION FOR SEQ ID NO:64: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: GATAAGGATC CGTTACCAGA TTCTTTCGAC TGG 33 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID TTATCAAGCT TCCATTTACA TGCCGTAAAA ATC 33 INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 923 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..802 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: A TTA ACA ATG ATG AAA CTT TTG GTA GTT TTT GCG ATT TTC GCT CAA 46 WO 97/40058 WO 9740058PCTIUS97/06121.- 220 Leu Thr Met Met Lys Leu Leu Val Val Phe Ala Ilie Phe Aia Gin 1 5 -10 ATC AGT TTT GTT TTT GGA AAT AAT GTA ACA GAA TTC GAT GAC CGA ATC 94 Ile Ser Phe Vai Phe Gly Asn Asn Val Thr Giu Phe Asp Asp Arg Ile 20 25 GTT GGA GGT GAA GAT GTT GAT ATA TCA ACT TGT GGT TGG CAA ATT TCA 142 Val Giy Gly Giu Asp Vai Asp Ile Ser Thr Cys Gly Trp Gin Ile Ser 40 TTT CAA AGT GAA AAC CTT CAT TTT TGT GGA GGA TCA ATT ATT GCA CCA 190 Phe Gin Ser Giu Asn Leu His Phe Cys Giy Giy Ser Ile Ile Aia Pro 55 AAA TGG ATT CTA ACT GCT GCA CAC TGT GTT GAA TGG TTG AAA AAG CCG 238 Lys Trp Ile Leu Thr Ala Ala His Cys Vai Giu Trp Leu Lys Lys Pro 70 CTC AAA GAC ATA ACC GTA CGT ATA GGA AGC AGT ATA CGT AAC AAA GGT 286 Leu Lys Asp Ile Thr Val Arg Ile Giy Ser Ser Ile Arg Asn Lys Giy 85 90 GGT CGA GTT CAT AAA GTA ATA GAT TTC CAC ATG CAT CCC TCG TAG AAT 334 Gly Arg Val His Lys Vai Ile Asp Phe His Met His Pro Ser Tyr Asn 100 105 110 AAG AGG GCG GAT TAT GAT TTT GAC. GTT GCT GTA CTA GAA CTT GAA AAA 382 Lys Arg Ala Asp Tyr Asp Phe Asp Val Ala Val Leu Giu Leu Giu Lys 115 120 125 CCA GTC TCA TAT ACG GTT TGT ACA GTA GTA TCA GTA GAT TTA GCC GAA 430 Pro Vai Ser Tyr Thr Val Cys Thr Vai Val Ser Val Asp Leu Ala Giu 130 135 140 AGT GGA ACT GAA GTT AAA CCT GGA GCA ATA CTT AGT GTC ACT GGA TGG 478 Ser Gly Thr Giu Vai Lys Pro Giy Ala Ile Leu Ser Val Thr Giy Trp 145 150 155 GGT GCA ACT AAG GAA GGT GGT GGC GGA ACT TTG CAA CTA CAA GGT GTG 526 Gly Ala Thr Lys Giu Gly Gly Giy Giy Thr Leui Gin Leu Gin Giy Val 160 165 170 175 AAA GTT CCA GCT ATC TCT CCC AAA GAT TGT GCT AAG GGA TAT CCA CCT 574 Lys Vai Pro Ala Ile Ser Pro Lys Asp Cys Ala Lys Gly Tyr Pro Pro 180 185 190 TCT GGA GGT AAA GAC AAA ATT ACA GAC AGC ATG TTA TGT GCT GGT CTT 622 Ser Gly Gly Lys Asp Lys Ile Thr Asp Ser Met Leu Cys Ala Gly Leu 195 200 205 CCT GAA GGA GGT AAA GAT TCC TGC CAA GGC GAC AGT GGC GGT CCA CTG 670 Pro Giu Giy Gly Lys Asp Ser Cys Gin Giy Asp Ser Giy Gly Pro Leu 210 215 220 GTA GAT GAA AAT AGA AAG CAA GTA GGA GTG GTT TCT TGG GGT CAA GGA 718 Vai Asp Giu Asn Arg Lys Gin Val Gly Vai Vai Ser Trp Gly Gin Giy 225 230 235 TGT GCC AGA CCA GGA AAA CCA GGA ATT TAT GOT AAA. GTG TCA CAC CCC 766 Cys Ala Arg Pro Gly Lys Pro Gly Ile Tyr Ala Lys Vai Ser His Pro 240 245 250 255 WO 97/40058 PCT/US97/06121.

221 GAA ATC AGA AAA TTT ATT GAA AAA TAT GCT AAT GTT TAAGTGGATT 812 Glu Ile Arg Lys Phe Ile -Glu Lys Tyr Ala Asn Val 260 265 TCATTTTCAA TATAATGTGA TTTAAGATAC TCTTTAATGT TATGATATGA ATTGTGATAA 872 ATTAAATAAT AAAGATTGAA GAAGTGATAA AAAAAAAAAA AAAAAAAAAA A 923 INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 267 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: Leu Thr Met Met Lys Leu Leu Val Val Phe Ala Ile Phe Ala Gin Ile 1 5 10 Ser Phe Val Phe Gly Asn Asn Val Thr Glu Phe Asp Asp Arg Ile Val 25 Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly Trp Gin Ile Ser Phe 40 Gin Ser Glu Asn Leu His Phe Cys Gly Gly Ser Ile Ile Ala Pro Lys 50 55 Trp Ile Leu Thr Ala Ala His Cys Val Glu Trp Leu Lys Lys Pro Leu 70 75 Lys Asp Ile Thr Val Arg Ile Gly Ser Ser Ile Arg Asn Lys Gly Gly 90 Arg Val His Lys Val Ile Asp Phe His Met His Pro Ser Tyr Asn Lys 100 105 110 Arg Ala Asp Tyr Asp Phe Asp Val Ala Val Leu Glu Leu Glu Lys Pro 115 120 125 Val Ser Tyr Thr Val Cys Thr Val Val Ser Val Asp Leu Ala Glu Ser 130 135 140 Gly Thr Glu Val Lys Pro Gly Ala Ile Leu Ser Val Thr Gly Trp Gly 145 150 155 160 Ala Thr Lys Glu Gly Gly Gly Gly Thr Leu Gin Leu Gin Gly Val Lys 165 170 175 -Val Pro Ala Ile Ser Pro Lys Asp Cys Ala Lys Gly Tyr Pro Pro Ser 180 185 190 Gly Gly Lys Asp Lys Ile Thr Asp Ser Met Leu Cys Ala Gly Leu Pro 195 200 205 Glu Gly Gly Lys Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Leu Val 210 215 220 WO 97/40058 PCT/US97/06121.

222 Asp Glu Asn Arg Lys Gin Val Gly Val Val Ser Trp Gly Gin Gly Cys 225 230- 235 240 Ala Arg Pro Gly Lys Pro Gly Ile Tyr Ala Lys Val Ser His Pro Glu 245 250 255 Ile Arg Lys Phe Ile Glu Lys Tyr Ala Asn Val 260 265 INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly Trp Cys 1 5 10 INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 34-amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: Ile Val Gly Gly Glu Asp Val Asp Ile Ser Thr Cys Gly Trp Gin Ile 1 5 10 Ser Phe Gin Ser Glu Asn Leu His Phe Cys Gly Gly Ser Ile Ile Ala 25 Pro Lys INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (ix) FEATURE: NAME/KEY: Protein LOCATION: 1..15 WO 97/40058 PCTIUS97/06121.

223 OTHER INFORMATION: /note= "At pos. 1, Xaa His or Arg; pos. 2, Val or Pro; pos. 3, Gly or Ala or Ser; pos. 4, Tyr or Gly; pos. 5, Glu or Asn; pos. 6, Asp or Lys; pos. 7, Val or Arg; pos. 8, Asp or Ala; pos. 13, Asp or Pro." (ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

Xaa His or Arg 1 Xaa 2 Xaa 3 (ix) FEATURE: NAME/KEY: Xaa LOCATION: 4 (ix) FEATURE: NAME/KEY: Xaa LOCATION: (ix) FEATURE: NAME/KEY: Xaa LOCATION: 6 (ix) FEATURE: NAME/KEY: Xaa LOCATION: 7 Val or Pro Gly or Ala or Ser Tyr or Gly Glu or Asn Asp or Lys Val or Arg Asp or Ala Asp or Pro (ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

Xaa 8 Xaa 13 (xi) SEQUENCE DESCRIPTION: SEQ ID Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Tyr Asp Phe Xaa Val Ala 1 5 10 INFORMATION FOR SEQ ID NO:71: SEQUENCE CHARACTERISTICS: LENGTH: 12 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (ix) FEATURE: NAME/KEY: Protein LOCATION: 1..12 OTHER INFORMATION: /note= "At pos. aa 1, Xaa Ile or WO 97/40058 PCT/US97/06121.

224 Gin; pos. 4, Tyr or Gly; pos. 5, Glu or Asn or Thr; pos. 6, Asp or Met or Pro; pos. 8; Lys or Asp; pos. 10, Asn or Ser; pos. 11, Met or Thr or Asn; pos. 12, Phe or Cys." (ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

Xaa 1 Xaa 4 Xaa (ix) FEATURE: NAME/KEY: Xaa LOCATION: 6 (ix) FEATURE: NAME/KEY: Xaa LOCATION: 8 (ix) FEATURE: NAME/KEY: Xaa LOCATION: Ile or Gin Tyr or Gly Glu or Asn or Thr Asp or Met or Pro Lys or Asp Asn or Ser Met or Thr or Asn Phe or Cys (ix) FEATURE:

NAME/KEY:

LOCATION:

(ix) FEATURE:

NAME/KEY:

LOCATION:

Xaa 11 Xaa 12 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: Xaa Val Gly Xaa Xaa Xaa Val Xaa Ile Xaa Xaa Xaa 1 5 INFORMATION FOR SEQ ID NO:72: SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (ix) FEATURE: NAME/KEY: Protein LOCATION: 1..15 OTHER INFORMATION: /note= Arg; at pos. 3, Xaa Ala or Ser." "At pos. aa 1, Xaa His or (ix) FEATURE: NAME/KEY: Xaa His or Arg LOCATION: 1 WO 97/40058 PCT/US97/06121 225 (ix) FEATURE: NAME/KEY: -Xaa Ala or Ser LOCATION: 3 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: Xaa Pro Xaa Tyr Asn Lys Arg Ala Asp Tyr Asp Phe Asp Val Ala 1 5 10 INFORMATION FOR SEQ ID NO:73: SEQUENCE CHARACTERISTICS: LENGTH: 28 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (ix) FEATURE: NAME/KEY: Protein LOCATION: 1..28 OTHER INFORMATION: /note= "At pos. aa 1, Xaa probably Cys." (ix) FEATURE: NAME/KEY: Xaa probably Cys LOCATION: 1 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: Xaa Pro Pro Pro Glu Met Leu Gly Gly Pro Ser Ile Phe Ile Phe Pro 1 5 10 Pro Lys Pro Lys Asp Asp Leu Leu Ile Lys Arg Lys INFORMATION FOR SEQ ID NO:74: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: CTTTCCTCAC AATACCACCA AGGAAGC 27 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 97/40058 PCT/US97/06121 226 (ii) MOLECULE TYPE: other nucleic acid (A)-DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID CTTGTACGAT TGTCTCAACA GGC 23 INFORMATION FOR SEQ ID NO:76: SEQUENCE CHARACTERISTICS: LENGTH: 573 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..573 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: CGA TCT TTT AGA CAA GCG AAA TTG ATA ACG AAG TTT TCG AAG TCG GAT 48 Arg Ser Phe Arg Gin Ala Lys Leu Ile Thr Lys Phe Ser Lys Ser Asp 1 5 10 GAA GTA AAA ACC TTG CGT TGG TTT CCC CGG TCC CAG GAT CAG GAA CAG 96 Glu Val Lys Thr Leu Arg Trp Phe Pro Arg Ser Gin Asp Gin Glu Gin 25 TTG CAC TTT ACC CCA ATG AGG GAA TTC GTG CAT CCC CAT TTT ACC GAA 144 Leu His Phe Thr Pro Met Arg Glu Phe Val His Pro His Phe Thr Glu 40 CAT ATT GAT GAA GAA TTC CAC CGA TTC ATC AAT AAA CAC GGA AAA ATT 192 His Ile Asp Glu Glu Phe His Arg Phe Ile Asn Lys His Gly Lys Ile 55 TAT AAT AAA AAT GAA GAA CAT CAT TTC CGC AAA GAA ATT TTC AGA CTA 240 Tyr Asn Lys Asn Glu Glu His His Phe Arg Lys Glu Ile Phe Arg Leu 65 70 75 AAC TTG AGG TAC ATT TTT TCT AAG AAT CGT GCA AAT TTG GGA TAC ACT 288 Asn Leu Arg Tyr Ile Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr Thr 90 TTG ACT GTT AAC CAT TTG GCT GAT CGT ACT GAA GCT GAA CTT AAG GCT 336 Leu Thr Val Asn His Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys Ala 100 105 110 TTG AGA GGA CAC AGA CCT TCC TCC GGT TAT AAT GGC GGT TTA CCC TTT 384 Leu Arg Gly His Arg Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro Phe 115 120 125 CCT CAC AAT ACC ACC AAG GAA GCA AGA AAT TTA CCA GAT TCT TTC GAC 432 Pro His Asn Thr Thr Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe Asp 130 135 140 TGG CGA ATT TAT GGA GCT GTT ACT CCA GTT AAA GAT CAA TCT GTT TGT 480 WO 97/40058 PCTIVJS97/06121 227 Trp Arg Ile Tyr Gly Ala Val Thr Pro Val Lys Asp Gin Ser Val Cys 145 150 155 160 GGT TCC TGC TGG TCT TTC GGA ACA ATT GGA GCA ATC GAA GGT GCA TAT 528 Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly Ala Ile Glu Gly Ala Tyr 165 170 175 TTC TTG AAA ACG GCG GTA ATC TGT ACG ATG TCT CAC AGC TTG ATG 573 Phe Leu Lys Thr Ala Val Ile Cys Thr Met Ser His Ser Leu Met 180 185 190 INFORMATION FOR SEQ ID NO:77: SEQUENCE CHARACTERISTICS: LENGTH: 191 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: Arg Ser Phe Arg Gin Ala Lys Leu Ile Thr Lys Phe Ser Lys Ser Asp 1 5 10 Glu Val Lys Thr Leu Arg Trp Phe Pro Arg Ser Gin Asp Gin Glu Gin 25 Leu His Phe Thr Pro Met Arg Glu Phe Val His Pro His Phe Thr Glu 40 His Ile Asp Glu Glu Phe His Arg Phe Ile Asn Lys His Gly Lys Ile 55 Tyr Asn Lys Asn Glu Glu His His Phe Arg Lys Glu Ile Phe Arg Leu 65 70 75 Asn Leu Arg Tyr Ile Phe Ser Lys Asn Arg Ala Asn Leu Gly Tyr Thr 90 Leu Thr Val Asn His Leu Ala Asp Arg Thr Glu Ala Glu Leu Lys Ala 100 105 110 Leu Arg Gly His Arg Pro Ser Ser Gly Tyr Asn Gly Gly Leu Pro Phe 115 120 125 Pro His Asn Thr Thr Lys Glu Ala Arg Asn Leu Pro Asp Ser Phe Asp 130 135 140 Trp Arg Ile Tyr Gly Ala Val Thr Pro Val Lys Asp Gin Ser Val Cys 145 150 155 160 Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly Ala Ile Glu Gly Ala Tyr 165 170 175 Phe Leu Lys Thr Ala Val Ile Cys Thr Met Ser His Ser Leu Met 180 185 190 WO 97/40058 PCT/US97/06121 228 INFORMATION FOR SEQ-ID NO:78: SEQUENCE CHARACTERISTICS: LENGTH: 410 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..408 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: TGG GTT GTT ACT GCT GCT CAT TGT TTG AGA GGC AAA GAC CAC CTC CTG 48 Trp Val Val Thr Ala Ala His Cys Leu Arg Gly Lys Asp His Leu Leu 1 5 10 GAC AAA CTG TTC ATT GCA GTC GGC CTG ACA AAT TTA GGT GAA GGA GGC 96 Asp Lys Leu Phe Ile Ala Val Gly Leu Thr Asn Leu Gly Glu Gly Gly 25 ACC GTG TAT CCT GTA GAA AAA GGC ATC ATG CAC GAA GAA TAT GAA CAT 144 Thr Val Tyr Pro Val Glu Lys Gly Ile Met His Glu Glu Tyr Glu His 35 40 TAT GAC ATA GTC AAC GAT ATT GCA CTA ATC AAA GTC AAA TCT CCG ATA 192 Tyr Asp Ile Val Asn Asp Ile Ala Leu Ile Lys Val Lys Ser Pro Ile 55 GAA TTC AAT GAA AAA GTA ACG ACT GTA AAA TTA GGT GAG GAT TAT GTT 240 Glu Phe Asn Glu Lys Val Thr Thr Val Lys Leu Gly Glu Asp Tyr Val 70 75 GGC GGA GAC GTC CAA CTT CGA TTG ACA GGA TGG GGA GTT ACG ACA AAT 288 Gly Gly Asp Val Gin Leu Arg Leu Thr Gly Trp Gly Val Thr Thr Asn 90 GAG GGA ATC GGA AGC CCG AGT CAA AAA TTA CAG GTC ATG ACA GCC AAA 336 Glu Gly Ile Gly Ser Pro Ser Gin Lys Leu Gin Val Met Thr Ala Lys 100 105 110 TCA CTA ACT TAT GAG GAT TGC AAA AAC GCA ATT TAT AAA AAA GAC TTT 384 Ser Leu Thr Tyr Glu Asp Cys Lys Asn Ala Ile Tyr Lys Lys Asp Phe 115 120 125 CGA AAG CCA AAT TTG TGC ACA GGC TA 410 Arg Lys Pro Asn Leu Cys Thr Gly 130 135 INFORMATION FOR SEQ ID NO:79: SEQUENCE CHARACTERISTICS: LENGTH: 136 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 97/40058 PCT/US97/06121 229 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: Trp 1 Asp Thr Tyr Glu Gly Glu Ser Arg Val Val Thr Ala Ala His Cys Leu 5 Lys Val Asp Phe Gly Gly Leu Lys 130 Leu Tyr Ile Asn Asp Ile Thr 115 Pro Phe 20 Pro Val Glu Val Gly 100 Tyr Asn Ile Val Asn Lys Gin Ser Glu Leu Ala Glu Asp Val 70 Leu Pro Asp Cys Val Lys Ile 55 Thr Arg Ser Cys Thr 135 Gly Gly 40 Ala Thr Leu Gin Lys 120 Gly Leu 25 Ile Leu Val Thr Lys 105 Asn Arg 10 Thr Met Ile Lys Gly Leu Ala Gly Lys Asp His Leu Leu Asn His Lys Leu 75 Trp Gin Ile Leu Glu Val Gly Gly Val Tyr Gly Glu Lys Glu Val Met Lys 125 Glu Tyr Ser Asp Thr Thr 110 Lys Gly His Ile Val Asn Lys Phe INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 433 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..432 (xi) SEQUENCE DESCRIPTION: SEQ ID GTA ACT GCT GCA CAT TGC TTT TAT GGA ACG TTA TTT CCG ATT GGA TTC Val Thr Ala Ala His Cys Phe Tyr Gly Thr Leu Phe Pro Ile Gly Phe 1 5 10 TCT GCG AGA GCC GGC AGC AGT ACT GTG AAT TCA GGA GGA ACT GTG CAT Ser Ala Arg Ala Gly Ser Ser Thr Val Asn Ser Gly Gly Thr Val His 25 ACA ATT TTG TAT TGG TAT ATT CAT CCA AAT TAT GAT TCA CAA AGT ACA Thr Ile Leu Tyr Trp Tyr Ile His Pro Asn Tyr Asp Ser Gin Ser Thr 40 GAC TTT GAT GTT TCT GTA GTT CGA CTA TTA TCT TCT TTA AAT TTG AAT WO 97/40058 PCTIUS97/06121.

Asp

GGA

Gly 65

CCA

Pro

AAT

Asn

GTT

Val

ACT

Thr 230 Phe Asp Val Ser Val Val Arg Leu Leu Ser Ser Leu Asn Leu Asn 55 GGT TCT ATT CGA CCG GCT AGG TTA GTG GAT TCT GGA ACT GAT TTG Gly Ser Ile Arg Pro Ala Arg Leu Val Asp Ser Gly Thr Asp Leu 70 75 GCC GGT GAG ATG GTT ACA GTA ACT GGA TGG GGA CGA CTT TCG GAA Ala Gly Glu Met Val Thr Val Thr Gly Trp Gly Arg Leu Ser Glu 90 ACT TCT GTT CCC TCG CCA TCA ACT CTT CAA GGA GTT ACA GTA CCA Thr Ser Val Pro Ser Pro Ser Thr Leu Gin Gly Val Thr Val Pro 100 105 110 GTA AGT AAT TCG GAA TGT CAA CAA CAA TTG CAA AAT CAG ACA ATC Val Ser Asn Ser Glu Cys Gin Gin Gin Leu Gin Asn Gin Thr Ile 115 120 125 GAC AAT ATG TTT TGT GCT GGT GAA TTA GAA GGA GGA AAG GAC TCT Asp Asn Met Phe Cys Ala Gly Glu Leu Glu Gly Gly Lys Asp Ser 130 135 140 240 288 336 384 432 433 INFORMATION FOR SEQ ID NO:81: SEQUENCE CHARACTERISTICS: LENGTH: 144 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein Val 1 Ser Thr Asp Gly 65 Pro Asn Val Thr (xi) SEQUENCE Thr Ala Ala His 5 Ala Arg Ala Gly Ile Leu Tyr Trp Phe Asp Val Ser Gly Ser Ile Arg Ala Gly Glu Met Thr Ser Val Pro 100 Val Ser Asn Ser 115 Asp Asn Met Phe 130 DESCRIPTION: SEQ ID Cys Phe Tyr Gly Thr 10 Ser Ser Thr Val Asn 25 Tyr Ile His Pro Asn 40 Val Val Arg Leu Leu 55 Pro Ala Arg Leu Val 70 Val Thr Val Thr Gly 90 Ser Pro Ser Thr Leu 105 Glu Cys Gin Gin Gin 120 Cys Ala Gly Glu Leu 135 NO: 81: Leu Phe Pro Ser Gly Gly Tyr Asp Ser Ser Ser Leu Asp Ser Gly 75 Trp Gly Arg Gin Gly Val Leu Gin Asn 125 Glu Gly Gly 140 Ile Thr Gin Asn Thr Leu Thr 110 Gin Lys Gly Val Ser Leu Asp Ser Val Thr Asp Phe His Thr Asn Leu Glu Pro Ile Ser WO 97/40058 PCT/US97/06121 231 INFORMATION FOR SEQ-ID NO:82: SEQUENCE CHARACTERISTICS: LENGTH: 426 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..426 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: GCC ACG ACA CCA AAT TCG AAC CTG AAG GTG CGT TTG GGC GAA TGG GAC 48 Ala Thr Thr Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp 1 5 10 GTT CGC GAC CAC GAT GAG CGA CTG AAC CAC GAG GAA TAC GCA ATC GAA 96 Val Arg Asp His Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu 25 CGC AAA GAA GTT CAT CCT TCA TAT TCA CCA ACC GAT TTC CGG AAT GAT 144 Arg Lys Glu Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp 35 40 GTA GCC TTA GTG AAA CTC GAT AGA ACT GTT ATT TTC AAA CAA CAT ATT 192 Val Ala Leu Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gln His Ile 55 TTA CCT GTC TGC TTA CCT CAT AAG CAA ATG AAA CTG GCT GGA AAA ATG 240 Leu Pro Val Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met 70 75 GCA ACA GTC GCC GGA TGG GGA CGG ACG AGG CAC GGG CAG AGC ACT GTG 288 Ala Thr Val Ala Gly Trp Gly Arg Thr Arg His Gly Gln Ser Thr Val 90 CCG GCT GTC TTA CAA GAA GTC GAT GTC GAG GTG ATT CCG AAT GAA AGA 336 Pro Ala Val Leu Gln Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg 100 105 110 TGC CAG AGG TGG TTC CGT GCT GCG GGT CGA CGA GAA ACC ATT CAC GAT 384 Cys Gin Arg Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp 115 120 125 GTC TTT CTC TGC GCC GGA TAT AAA GAG GGT GGT CGT GAT TCA 426 Val Phe Leu Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp Ser 130 135 140 INFORMATION FOR SEQ ID NO:83: SEQUENCE CHARACTERISTICS: LENGTH: 142 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 97/40058 PCT/US97/06121 232 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: Ala Thr Thr Pro Asn Ser Asn Leu Lys Val Arg Leu Gly Glu Trp Asp 1 5 10 Val Arg Asp His Asp Glu Arg Leu Asn His Glu Glu Tyr Ala Ile Glu 20 25 Arg Lys Glu Val His Pro Ser Tyr Ser Pro Thr Asp Phe Arg Asn Asp 40 Val Ala Leu Val Lys Leu Asp Arg Thr Val Ile Phe Lys Gin His Ile 55 Leu Pro Val Cys Leu Pro His Lys Gin Met Lys Leu Ala Gly Lys Met 70 75 Ala Thr Val Ala Gly Trp Gly Arg Thr Arg His Gly Gin Ser Thr Val 90 Pro Ala Val Leu Gin Glu Val Asp Val Glu Val Ile Pro Asn Glu Arg 100 105 110 Cys Gin Arg Trp Phe Arg Ala Ala Gly Arg Arg Glu Thr Ile His Asp 115 120 125 Val Phe Leu Cys Ala Gly Tyr Lys Glu Gly Gly Arg Asp Ser 130 135 140 INFORMATION FOR SEQ ID NO:84: SEQUENCE CHARACTERISTICS: LENGTH: 778 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..778 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: T GAT ACC TCA GAA TTG CCG GTA CAA AGG CGA ACG GTT TCG AGT GCG 46 Asp Thr Ser Glu Leu Pro Val Gin Arg Arg Thr Val Ser Ser Ala 1 5 10 GTT TGT CAA TTT TCG TGC GTC CTG GGC GGC GGA AAA CCT CTT GAC CTG 94 Val Cys Gin Phe Ser Cys Val Leu Gly Gly Gly Lys Pro Leu Asp Leu 25 TGC AGC GGC GGA ATG ATC TGG TCG TGC TGC GTC GAC AGG GAC ATT CGG 142 Cys Ser Gly Gly Met Ile Trp Ser Cys Cys Val Asp Arg Asp Ile Arg 40 CCT GAG CCG CAG CAC CAG GGC GCT CTG CAG AAC GCA ACT TGT GGA GAA 190 Pro Glu Pro Gln His Gin Gly Ala Leu Gin Asn Ala Thr Cys Gly Glu 55 TTG TAC ACG AGG TCT AAT AGA ATC GTA GGA GGT CAT TCA ACA GGA TTC 238 WO 97/40058 PCT/US97/06121 Leu

GGG

Gly 80

AAA

Lys

ACT

Thr

CGA

Arg

GAG

Glu

ACC

Thr 160

ATT

Ile

AAA

Lys

CAC

His

GTG

Val

CGA

-Arg 240

GGT

Gly 233 Tyr Thr Arg Ser Asn Arg Ile Val Gly Gly His Ser Thr Gly Phe 70 TCT CAT CCT TGG CAG GCG GCT TTG ATC AAA TCA GGA TTT TTG AGT Ser His Pro Trp Gin Ala Ala Leu Ile Lys Ser Gly Phe Leu Ser 85 90 AAA TTA TCT TGC GGT GGC GCT TTA GTT AGC GAT CGA TGG GTT ATA Lys Leu Ser Cys Gly Gly Ala Leu Val Ser Asp Arg Trp Val Ile 100 105 110 GCT GCA CAT TGC GTT GCC ACG ACA CCA AAT TCG AAC CTG AAG GTG Ala Ala His Cys Val Ala Thr Thr Pro Asn Ser Asn Leu Lys Val 115 120 125 TTG GGC GAA TGG GAC GTC CGC GAC CAC GAT GAG CGA CTG AAC CAC Leu Gly Glu Trp Asp Val Arg Asp His Asp Glu Arg Leu Asn His 130 135 140 GAA TAC GCA ATC GAA CGC AAA GAA GTT CAT CCT TCA TAT TCA CCA Glu Tyr Ala Ile Glu Arg Lys Glu Val His Pro Ser Tyr Ser Pro 145 150 155 GAT TTC CGG AAT GAT GTA GCC TTA GTG AAA CTC GAT AGA ACT GTT Asp Phe Arg Asn Asp Val Ala Leu Val Lys Leu Asp Arg Thr Val 165 170 175 TTC AAA CAA CAT ATT TTA CCT GTC TGC TTA CCT CAT AAG CAA ATG Phe Lys Gin His Ile Leu Pro Val Cys Leu Pro His Lys Gin Met 180 185 190 CTG GCT GGA AAA ATG GCA ACA GTC GCC GGA TGG GGA CGG ACG AGG Leu Ala Gly Lys Met Ala Thr Val Ala Gly Trp Gly Arg Thr Arg 195 200 205 GGG CAG AGC ACT GTG CCG GCT GTC TTA CAA GAA GTC GAT GTC GAG Gly Gin Ser Thr Val Pro Ala Val Leu Gin Glu Val Asp Val Glu 210 215 220 ATT CCG AAT GAA AGA TGC CAG AGG TGG TTC CGT GCT GCG GGT CGA Ile Pro Asn Glu Arg Cys Gin Arg Trp Phe Arg Ala Ala Gly Arg 225 230 235 GAA ACC ATT CAC GAT GTC TTT CTC TGC GCC GGA TAT AAA GAG GGT Glu Thr Ile His Asp Val Phe Leu Cys Ala Gly Tyr Lys Glu Gly 245 250 255 CGT GAT TCA Arg Asp Ser INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 259 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 Asp Cys Ser Gi u Tyr 65 Ser Lys Ala Leu Gl u 145 Asp Phe Leu Gi y Ile 225 Glu Arg Thr Gin Gi y Pro Thr His Leu Al a Gi y 130 Tyr Phe Lys Al a Gin 210 Pro Thr Asp Ser Phe Gi y Gin Arg Pro S er His 115 Giu Al a Arg Gin Gi y 195 Ser Asn Ile Ser Giu Ser Met His Ser T rp Cys 100 Cys T rp Ile Asn His 180 Lys Thr Gi u His Leu Cys Ile Gin Asn Gin Gi y Val Asp Glu Asp 165 Ile Met Val Arg Asp 245 Pro Vai T rp Gi y Arg 70 Al a Gi y Al a Val Arg 150 Val Leu Al a Pro Cys 230 Val Leu Ser Al a 55 Ile Al a Al a Thr Arg 135 Lys Ala Pro Thr Ala 215 Gin Gin Gi y Cys 40 Leu Val Leu Leu Thr 120 Asp Gi u Leu Val Val 200 Val Arg 234 Arg Arg 10 Gly Gly 25 Cys Val Gin Asn Gly Gly Ile Lys 90 Val Ser 105 Pro Asn His Asp Val His Val Lys 170 Cys Leu 185 Al1a Gly Leu Gin Trp Phe Thr Lys Asp Ala His 75 S er Asp Ser Giu Pro i55 Leu Pro Trp Gi u Arg 235 Val Pro Arg Thr Ser Gi y Arg Asn Arg 140 Ser Asp His Gi y Vai 220 Al a Ser Leu Asp Cys Thr Phe Trp Leu 125 Leu Tyr Arg Lys Arg 205 Asp Al a Ser Asp Ile Gi y Gly Leu Val 110 Lys Asn Ser Thr Gin 190 Thr Val Gi y Al a Leu Arg Giu Phe Ser Ile Val His Pro Val 175 Met Arg Glu Arg Val Cys Pro Leu Gi y Lys Th r Arg Glu Thr 160 Ile Lys His Val Arg 240 Val Phe Leu Cys Ala Gly Tyr Lys Giu Gly Gly INFORMATION FOR SEQ ID NO:86: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "probe" WO-97/40058 PCT/US97/06121 235 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: TAAWGGWCCA GARTCTCCTT GACA 24 INFORMATION FOR SEQ ID NO:87: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: GGAAACAGCT ATGACCATG 19 INFORMATION FOR SEQ ID NO:88: SEQUENCE CHARACTERISTICS: LENGTH: 811 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..762 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: ATG CGG GGT TCT CAT CAT CAT CAT CAT CAT GGT ATG GCT AGC ATG ACT 48 Met Arg Gly Ser His His His His His His Gly Met Ala Ser Met Thr 1 5 10 GGT GGA CAG CAA ATG GGT CGG GAT CTG TAC GAC GAT GAC GAT AAG AAG 96 Gly Gly Gln Gin Met Gly Arg Asp Leu Tyr Asp Asp Asp Asp Lys Lys 25 GAT CCG TTA CCA GAT TCT TTC GAC TGG AGA ATT TAT GGA GCT GTT ACT 144 Asp Pro Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr 40 CCA GTT AAA GAT CAA TCT GTT TGT GGT TCC TGC TGG TCT TTC GGA ACA 192 Pro Val Lys Asp Gln Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr 50 55 ATT GGA GCT ATC GAA GGT GCA TAT TTC TTG AAA AAC GGC GGT AAT CTT 240 Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu 70 75 GTA CGA TTG TCT CAA CAG GCT TTG ATT GAT TGT TCT TGG GGA TAT GGA 288 Val Arg Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly 90 WO 97/40058 PCT/US97/06121 236 AAT AAT GGT TGT GAC GGT GGC GAG GAC TTC CGC GCT TAC CAA TGG ATG 336 Asn Asn Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr Gln.Trp Met 100 105 110 ATG AAA CAT GGA GGA ATT CCT ACT GAA GAA GAT TAT GGT GGT TAC TTG 384 Met Lys His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu 115 120 125 GGA CAA GAT GGT TAC TGC CAT GTC AAC AAC GTT ACT TTA GTT GCT CCC 432 Gly Gin Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro 130 135 140 ATC ACA GGG TAT GTC AAC GTA ACA CGT AAT GAC GTT GAC GCT ATG AAG 480 Ile Thr Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys 145 150 155 160 GTT GCC CTT CTT AAA CAC GGT CCA ATT TCG GTG GCC ATT GAC GCA TCA 528 Val Ala Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser 165 170 175 CAC AAA ACA TTC AGT TTT TAC TCC AAC GGC GTT TAC TAC GAA CCG AAA 576 His Lys Thr Phe Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Glu Pro Lys 180 185 190 TGT GGC AAT AAA GAG GAC GAG TTG GAC CAT GCC GTA TTA GTA GTC GGT 624 Cys Gly Asn Lys Glu Asp Glu Leu Asp His Ala Val Leu Val Val Gly 195 200 205 TAT GGT GAA ATC AAC AAC GAA CCT TAC TGG TTG GTC AAG AAT TCC TGG 672 Tyr Gly Glu Ile Asn Asn Glu Pro Tyr Trp Leu Val Lys Asn Ser Trp 210 215 220 TCG AAT TTG TGG GGA AAT GAT GGT TAT ATT TTG ATG TCC GCC AGA AAT 720 Ser Asn Leu Trp Gly Asn Asp Gly Tyr Ile Leu Met Ser Ala Arg Asn 225 230 235 240 AAT AAT TGC GGA GTT TTG ACT GAT CCA ACT TAT GTT ACT ATG 762 Asn Asn Cys Gly Val Leu Thr Asp Pro Thr Tyr Val Thr Met 245 250 TAACTTACTT TTTAGGAATT TGATTTTTAC GGCATGTAAA TGGAAGCTT 811 INFORMATION FOR SEQ ID NO:89: SEQUENCE CHARACTERISTICS: LENGTH: 254 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: Met Arg Gly Ser His His His His His His Gly Met Ala Ser Met Thr 1 5 o10 Gly Gly Gin Gin Met Gly Arg Asp Leu Tyr Asp Asp Asp Asp Lys Lys 25 Asp Pro Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr 40 WO 97/40058 PCT/US97/06121 237 Pro Val Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr 55 Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu 70 75 Val Arg Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly 90 Asn Asn Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr Gin Trp Met 100 105 110 Met Lys His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu 115 120 125 Gly Gin Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro 130 135 140 Ile Thr Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys 145 150 155 160 Val Ala Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser 165 170 175 His Lys Thr Phe Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Glu Pro Lys 180 185 190 Cys Gly Asn Lys Glu Asp Glu Leu Asp His Ala Val Leu Val Val Gly 195 200 205 Tyr Gly Glu Ile Asn Asn Glu Pro Tyr Trp Leu Val Lys Asn Ser Trp 210 215 220 Ser Asn Leu Trp Gly Asn Asp Gly Tyr Ile Leu Met Ser Ala Arg Asn 225 230 235 240 Asn Asn Cys Gly Val Leu Thr Asp Pro Thr Tyr Val Thr Met 245 250 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 823 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CCATTTACAT GCCGTAAAAA TCAAATTCCT AAAAAGTAAG TTACATAGTA ACATAAGTTG GATCAGTCAA AACTCCGCAA TTATTATTTC TGGCGGACAT CAAAATATAA CCATCATTTC 120 CCCACAAATT CGACCAGGAA TTCTTGACCA ACCAGTAAGG TTCGTTGTTG ATTTCACCAT 180 AACCGACTAC TAATACGGCA TGGTCCAACT CGTCCTCTTT ATTGCCACAT TTCGGTTCGT 240 AGTAAACGCC GTTGGAGTAA AAACTGAATG TTTTGTGTGA TGCGTCAATG GCCACCGAAA 300 TTGGACCGTG TTTAAGAAGG GCAACCTTCA TAGCGTCAAC GTCATTACGT GTTACGTTGA. 360 WO 97/40058 WO 9740058PCTIUS97/06121 238

CATACCCTGT

CCAAGTAACC

AAGCGCGGAA

TCAAAGCCTG

CGATAGCTCC

GAGTAACAGC

ACGGATCCTT

TGCTAGCCAT

GATGGGAGCA

ACCATAATCT

GTCCTCGCCA

TTGAGACALT

AATTGTTCCG

TCCAT.AAATT

CTTATCGTCA

ACCATGATGA

ACTAAAGTAA

TCTTCAGTAG

CCGTCACAAC

CGTACAAGAT

AAAGACCAGC

CTCC.AGTCGA

TCGTCGTACA

TGATGATGAT

CGTTGTTGAC

GAATTC CTC C

CATTATTTCC

TACCGCCGTT

AGGAACCACA.

AAGAATCTGG

GAT C CCGAC C

GAGAACCCCG

ATGGCAGTAA

ATGTTTCATC

ATATC C CCAA

TTTCAAGAAA

AACAGATTGA

TAACGGATCC

CATTTCCTGT

CAT

CCATCTTGT C

ATCCATTGGT

GAACAATCAA

TATGCACCTT

TCTTTAACTG

TTATCTGGTA

CCACCAGTCA

INFORMATION FOR SEQ ID NO:9i: SEQUENCE CHARACTERISTICS: LENGTH: 762 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 762 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: ATG CGG GGT TCT CAT CAT CAT CAT CAT CAT GGT ATG GCT AGC ATG ACT Met Arg Gly Ser His His His His His His Gly Met Ala Ser Met Thr 1 5 10 GGT GGA CAG CAA ATG GGT CGG GAT CTG TAC GAC GAT GAC GAT AAG AAG Gly Gly Gin Gin Met Gly Arg Asp Leu Tyr Asp Asp Asp Asp Lys Lys 20 25 GAT CCG TTA CCA GAT TCT TTC GAC TGG AGA ATT TAT GGA GCT GTT ACT Asp Pro Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr 40 CCA GTT AAA GAT CAA TCT GTT TGT GGT TCC TGC TGG TCT TTC GGA ACA Pro Val Lys Asp Gin Ser Val Cys Giy Ser Cys Trp Ser Phe Gly Thr 55 ATT GGA GCT ATC GAA GGT GCA TAT TTC TTG AAA AAC GGC GGT AAT CTT Ile Gly Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Giy Gly Asn Leu 70 75 GTA CCA TTG TCT CAA CAG GCT TTG ATT GAT TGT TCT TGG GCA TAT GGA Val Arg Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly 90 AAT AAT GGT TGT CAC GGT GGC GAG GAC TTC CCC GCT TAC CAA TGG ATG Asn Asn Giy Cys Asp Gly Gly Ciu Asp Phe Arg Ala Tyr Gin Trp Met 100 105 110 ATG AAA CAT GGA GGA ATT CCT ACT GAA GAA CAT TAT GGT GGT TAC TTG 420 480 540 600 660 720 780 823 48 96 144 192 240 288 336 384 WO 97/40058 PCT/US97/0611 Met

GGA

Gly

ATC

Ile 145

GTT

Val

CAC

His

TGT

Cys

TAT

Tyr

TCG

Ser 225

AAT

Asn Lys

CAA

Gin 130

ACA

Thr

GCC

Ala

AAA

Lys

GGC

Gly

GGT

Gly 210

AAT

Asn

AAT

Asn 239 His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu 115 120 125 GAT GGT TAC TGC CAT GTC AAC AAC GTT ACT TTA GTT GCT CCC Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro 135 140 GGG TAT GTC AAC GTA ACA CGT AAT GAC GTT GAC GCT ATG AAG Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys 150 155 160 CTT CTT AAA CAC GGT CCA ATT TCG GTG GCC ATT GAC GCA TCA Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser 165 170 175 ACA TTC AGT TTT TAC TCC AAC GGC GTT TAC TAC GAA CCG AAA Thr Phe Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Glu Pro Lys 180 185 190 AAT AAA GAG GAC GAG TTG GAC CAT GCC GTA TTA GTA GTC GGT Asn Lys Glu Asp Glu Leu Asp His Ala Val Leu Val Val Gly 195 200 205 GAA ATC AAC AAC GAA CCT TAC TGG TTG GTC AAG AAT TCC TGG Glu Ile Asn Asn Glu Pro Tyr Trp Leu Val Lys Asn Ser Trp 215 220 TTG TGG GGA AAT GAT GGT TAT ATT TTG ATG TCC GCC AGA AAT Leu Trp Gly Asn Asp Gly Tyr Ile Leu Met Ser Ala Arg Asn 230 235 240 TGC GGA GTT TTG ACT GAT CCA ACT TAT GTT ACT ATG Cys Gly Val Leu Thr Asp Pro Thr Tyr Val Thr Met 245 250 INFORMATION FOR SEQ ID NO:92: SEQUENCE CHARACTERISTICS: LENGTH: 254 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE Met Arg Gly Ser His 1 5 Gly Gly Gin Gin Met Asp Pro Leu Pro Asp Pro Val Lys Asp Gin Ile Gly Ala Ile Glu DESCRIPTION: SEQ ID His His His His His 10 Gly Arg Asp Leu Tyr 25 Ser Phe Asp Trp Arg 40 Ser Val Cys Gly Ser 55 Gly Ala Tyr Phe Leu 70 NO:92: Gly Met Ala Ser Met Thr Asp Asp Asp Asp Lys Lys Ile Tyr Gly Ala Val Thr Cys Trp Ser Phe Gly Thr Lys Asn Gly Gly Asn Leu 75 WO 97/40058 WO 9740058PCTIUS97/06111 240 Val Arg Leu Ser Gin Gin Ala Leu Ile Cys Ser Trp Gly Tyr Gly Asn Asn Gly Met Lys His 115 Gly Gin Asp 130 Ile Thr Giv Cys 100 Gi y Asp Gly Gly Glu Asp 105 Glu Arg Ala Tyr Gly Ile Pro Glu Asp Tyr Gin Trp Met 110 Gly Tyr Leu Val Ala Pro Gly Tyr Cys His 135 Val Asn Asn Val Thr 140 Tyr Val Thr Arg Asn Val Asp Ala Met Val Ala Leu Leu Lys 165 Ser Gly Pro Ile Ser 170 Gi y Ala Ile Asp Ala Ser 175 His Lys Thr Cys Gly Asn 195 Tyr Gly Glu Phe 180 Lys Phe Tyr Ser Asn 185 Asp Val Tyr Tyr Glu Pro Lys 190 Val Val Gly Glu Asp Glu Leu 200 Pro His Ala Val Ile Asn Asn 210 As n Glu 215 Asp Tyr Trp Leu Val 220 Met Lys Asn Ser Trp Ser 225 Leu Trp Gly Asn 230 Gly Tyr Ile Leu 235 Ser Ala Arg Asn 240 Asn Asn Cys Gly Val 245 Leu Thr Asp Pro Thr 250 Tyr Val Thr Met INFORMATION FOR SEQ ID NO:93: SEQUENCE CHARACTERISTICS: LENGTH: 783 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: TTACATAGTA ACATJAAGTTG GATCAGTCAA AACTCCGCAA TTATTATTTC CAAAATATAA CCATCATTTC CCCACAAATT CGACCAGGAA TTCTTGACCA TTCGTTGTTG ATTTCACCAT AACCGACTAC TAATACGGCA TGGTCCAACT ATTGCCACAT TTCGGTTCGT AGTAAACGCC GTTGGAGTAA AAACTGAATG TGCGTCAATG GCCACCGAAA TTGGACCGTG TTTAAGAAGG GCAACCTTCA GTCATTACGT GTTACGTTGA CATACCCTGT GATGGGAGCA ACTAAAGTAA

TGGCGGACAT

ACCAGTAAGG

CGTCCTCTTT

TTTTGTGTGA

TAGCGTCAAC

CGTTGTTGAC

WO 97/40058 PTU9/62 PCTIUS97/06121 241

ATGGCAGTAA

AT GTTT CAT C

ATATCCCCAA

TTTCAAGAAA

AACAGATTGA

TAACGGATCC

CATTTCCTGT

CAT

CCATCTTGTC

ATCCATTGGT

GAACAATCAA

TATGCACCTT

TCTTTAACTG

TTATC.TGGTA

CCACCAGTCA

CCAAGTAACC

AAGCGCGGAA

TCAAAGCCTG

CGATAGCTCC

GAGTAACAGC

ACGGATCCTT

TGCTAGCCAT

AC CATAAT CT

GTCCTCGCCA

TTGAGACAAT

AATTGTTCCG

TCCATAAATT

CTTATCGTCA

AC CAT GAT GA

TCTTCAGTAG

CCGTCACAAC

CGTACAAGAT

AAAGACCAGC

CTCCAGTCGA

TCGTCGTACA

TGATGATGAT

GAATT CCT CC

CATTATTTCC

TACCGCCGTT

AGGAACCACA

AAGAATCTGG

GAT C CCGAC C

GAGAACCCCG

783 INFORMATION FOR SEQ ID NO:94: SEQUENCE CHARACTERISTICS: LENGTH: 660 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: l. .660 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: TTA CCA GAT TCT TTC GAC TGG AGA ATT TAT GGA GCT GTT ACT CCA GTT Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr Pro Val 1 5 10 AAA GAT CAA TCT GTT TGT GGT TCC TGC TGG TCT TTC GGA ACA ATT GGA Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly 20 25 GCT ATC GAA GGT GCA TAT TTC TTG AAA AAC GGC GGT AAT CTT GTA CGA Ala Ile Giu Giy Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu Val Arg 40 TTG TCT Leu Ser GGT TGT Gly Cys CAA CAG GCT TTG ATT GAT TGT TCT TGG GGA TAT GGA AAT AAT Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly Asn Asn 55 GAC GGT GGC GAG GAC TTC CGC GCT TAC CAA TGG ATG ATG AAA Asp Gly Giy Giu Asp Phe Arg Ala Tyr Gin Trp Met Met Lys 70 75 CAT GGA GGA ATT CCT ACT GAA GAA GAT TAT GGT GGT TAC TTG GGA CA His Gly Giy Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu Gly Gin 90 GAT GGT TAC TGC CAT GTC AAC AAC GTT ACT TTA GTT GCT CCC ATC ACA Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro Ile Thr 100 105 110 GGG TAT GTC AAC GTA ACA CGT PAT GAC GTT GAC GCT ATG AAG GTT GCC WO 97/40058 PCT/US97/06121 242 Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys Val Ala 115 120 125 CTT CTT AAA CAC GGT CCA ATT TCG GTG GCC ATT GAC GCA TCA CAC AAA Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser His Lys 130 135 140 ACA TTC AGT TTT TAC TCC AAC GGC GTT TAC TAC GAA CCG AAA TGT GGC Thr Phe Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Glu Pro Lys Cys Gly 145 150 155 160 AAT AAA GAG GAC GAG TTG GAC CAT GCC GTA TTA GTA GTC GGT TAT GGT Asn Lys Glu Asp Glu Leu Asp His Ala Val Leu Val Val Gly Tyr Gly 165 170 175 GAA ATC AAC AAC GAA CCT TAC TGG TTG GTC AAG AAT TCC TGG TCG AAT Glu Ile Asn Asn Glu Pro Tyr Trp Leu Val Lys Asn Ser Trp Ser Asn 180 185 190 TTG TGG GGA AAT GAT GGT TAT ATT TTG ATG TCC GCC AGA AAT AAT AAT Leu Trp Gly Asn Asp Gly Tyr Ile Leu Met Ser Ala Arg Asn Asn Asn 195 200 205 TGC GGA GTT TTG ACT GAT CCA ACT TAT GTT ACT ATG Cys Gly Val Leu Thr Asp Pro Thr Tyr Val Thr Met 210 215 220 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 220 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Leu Pro Asp Ser Phe Asp Trp Arg Ile Tyr Gly Ala Val Thr Pro Val 1 5 10 Lys Asp Gin Ser Val Cys Gly Ser Cys Trp Ser Phe Gly Thr Ile Gly 25 Ala Ile Glu Gly Ala Tyr Phe Leu Lys Asn Gly Gly Asn Leu Val Arg 40 Leu Ser Gin Gin Ala Leu Ile Asp Cys Ser Trp Gly Tyr Gly Asn Asn 50 55 Gly Cys Asp Gly Gly Glu Asp Phe Arg Ala Tyr Gin Trp Met Met Lys 70 75 His Gly Gly Ile Pro Thr Glu Glu Asp Tyr Gly Gly Tyr Leu Gly Gin 90 Asp Gly Tyr Cys His Val Asn Asn Val Thr Leu Val Ala Pro Ile Thr 100 105 110 Gly Tyr Val Asn Val Thr Arg Asn Asp Val Asp Ala Met Lys Val Ala 115 120 125 WO 97/40058 PCT/US97/06121 243 Leu Leu Lys His Gly Pro Ile Ser Val Ala Ile Asp Ala Ser His Lys 130 135 140 Thr Phe Ser Phe Tyr Ser Asn Gly Val Tyr Tyr Glu Pro Lys Cys Gly 145 150 155 160 Asn Lys Glu Asp Glu Leu Asp His Ala Val Leu Val Val Gly Tyr Gly 165 170 175 Glu Ile Asn Asn Glu Pro Tyr Trp Leu Val Lys Asn Ser Trp Ser Asn 180 185 190 Leu Trp Gly Asn Asp Gly Tyr Ile Leu Met Ser Ala Arg Asn Asn Asn 195 200 205 Cys Gly Val Leu Thr Asp Pro Thr Tyr Val Thr Met 210 215 220 INFORMATION FOR SEQ ID NO:96: SEQUENCE CHARACTERISTICS: LENGTH: 25 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:96: Xaa Pro Pro Pro Glu Met Leu Gly Gly Pro Ser Ile Phe Ile Phe Pro 1 5 10 Pro Lys Lys Lys Asp Asp Leu Leu Ile INFORMATION FOR SEQ ID NO:97: SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:97: TTGGGATACA CTTTGACTGT TAACC INFORMATION FOR SEQ ID NO:98: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 97/40058 PCT/US97/06121 244 (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: GTGAGCAACC ATTATTTCCA TATC 24 INFORMATION FOR SEQ ID NO:99: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:99: TGGGTWGTWA CWGCWGCWCA TTG 23 INFORMATION FOR SEQ ID NO:100: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: ATTCCTCGTG GTTCAGTCGC TC 22 INFORMATION FOR SEQ ID NO:101: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:101: GAAGATGTWG ATATTTCWAC ATGTGG 26 INFORMATION FOR SEQ ID NO:102: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single WO 97/40058 PCT/US97/06121 245 TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:102: GAAAATGAAA TCCACTTAAA CATTACG 27 INFORMATION FOR SEQ ID NO:103: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:103: CTCTTATTGT ACGAGGGATG C 21 INFORMATION FOR SEQ ID NO:104: SEQUENCE CHARACTERISTICS: LENGTH: 25 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:104: Asp Cys Pro Lys Cys Pro Pro Pro Glu Met Leu Gly Gly Pro Ser Ile 1 5 10 Phe Ile Phe Pro Pro Lys Pro Lys Asp INFORMATION FOR SEQ ID NO:105: SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:105: Asp Cys Pro Lys Cys Pro Pro Pro Glu Met Leu Gly Gly Pro Ser Ile 1 5 10 WO 97/40058 PCT/US97/06121 246 Phe Ile Phe Pro Pro Lys Pro Lys Asp Asp Leu Leu Ile Lys Arg Lys 25 Ser Glu Val INFORMATION FOR SEQ ID NO:106: SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:106: Asp Cys Pro Lys Cys Pro Pro Pro Glu Met Leu Gly Gly Pro Ser Ile 1 5 10 Phe Ile Phe Pro Pro Lys Pro Lys Asp Thr Leu Ser Ile Ser Arg Thr 25 Pro Glu Val INFORMATION FOR SEQ ID NO:107: SEQUENCE CHARACTERISTICS: LENGTH: 23 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:107: Leu Ala Thr Thr Gin Phe Gin Ala Thr His Ala Arg Ser Ala Phe Pro 1 5 10 Cys Phe Asp Glu Pro Ala Met INFORMATION FOR SEQ ID NO:108: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" WO 97/40058 PCT/US97/06121 247 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:108: CCCAAATTTT CCATWGCNCC NGC 23 INFORMATION FOR SEQ ID NO:109: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1..20 OTHER INFORMATION: /label= primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:109: CAATTYCAAG CTACYCATGC INFORMATION FOR SEQ ID NO:110: SEQUENCE CHARACTERISTICS: LENGTH: 383 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..382 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:110: C CGT AGT GCT TTC CCT TGT TTC GAT GAA CCA GCA ATG AAG GCC CAT 46 Arg Ser Ala Phe Pro Cys Phe Asp Glu Pro Ala Met Lys Ala His 1 5 10 TTC GAA ATC AGC CTT ATA CAC CAT GAA AAA TTG AAA GCA ATT TCC AAT 94 Phe Glu Ile Ser Leu Ile His His Glu Lys Leu Lys Ala Ile Ser Asn 25 ATG GGT GTA GCA AAG GAA GAA AAC TTA GAT AAC AAC CGA AAA AGA ACA 142 Met Gly Val Ala Lys Glu Glu Asn Leu Asp Asn Asn Arg Lys Arg Thr 40 ACA TTC GAA CAA TCA GTT CTC ATG TCT CCA TAC CTG GTG GCG TTT ATT 190 Thr Phe Glu Gin Ser Val Leu Met Ser Pro Tyr Leu Val Ala Phe Ile 55 ATC TCA GAT TTC GAA TAT GTA GAA AAA ATT TCA GGA CCA GTG AAA TAC 238 Ile Ser Asp Phe Glu Tyr Val Glu Lys Ile Ser Gly Pro Val Lys Tyr 70 AGA ATA TAT ACT GAT CCT TTC TCG ATT GAT CAA GCT GAC TAT GCA TTG 286 WO 97/40058 PCT/US97/06121 248 Arg Ile Tyr Thr Asp Pro Phe Ser Ile Asp Gin Ala Asp Tyr Ala Leu 85 90 ACT ATG AGC CCC AAA AAT TTT AAC GGC TTT GGA ACA ACT CAC AGG TGT Thr Met Ser Pro Lys Asn Phe Asn Gly Phe Gly Thr Thr His Arg Cys 100 105 110 AAA ATA TGT TTT GAA CAA GTT GGA CCA AGC AGC CAT TCC AGA TTT TGC Lys Ile Cys Phe Glu Gin Val Gly Pro Ser Ser His Ser Arg Phe Cys 115 120 125

T

INFORMATION FOR SEQ ID NO:111: SEQUENCE CHARACTERISTICS: LENGTH: 127 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein Arg 1 Glu Gly Phe Ser Ile Met Ile (xi) SEQUENCE Ser Ala Phe Pro 5 Ile Ser Leu Ile 20 Val Ala Lys Glu Glu Gin Ser Val Asp Phe Glu Tyr Tyr Thr Asp Pro Ser Pro Lys Asn 100 Cys Phe Glu Gin 115 DESCRIPTION: SEQ ID NO:111: Cys Phe Asp Glu Pro Ala Met 10 His His Glu Lys Leu Lys Ala 25 Glu Asn Leu Asp Asn Asn Arg 40 Leu Met Ser Pro Tyr Leu Val 55 Val Glu Lys Ile Ser Gly Pro 70 75 Phe Ser Ile Asp Gin Ala Asp 90 Phe Asn Gly Phe Gly Thr Thr 105 Val Gly Pro Ser Ser His Ser 120 Lys Ile Lys Ala Val Tyr His Arg 125 Ala Ser Arg Phe Lys Ala Arg 110 Phe His Asn Thr Ile Tyr Leu Cys Cys Phe Met Thr Ile Arg Thr Lys INFORMATION FOR SEQ ID NO:112: SEQUENCE CHARACTERISTICS: LENGTH: 537 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..535 WO 97/40058 PCT/US97/06121 249 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:112: A TAT ATT TTG GAA ATA AAA GGC TTC AAA TCT ACT TTG AGA GAT GAC 46 Tyr Ile Leu Glu Ile Lys Gly Phe Lys Ser Thr Leu Arg Asp Asp 1 5 10 ATG GCT GGC TTC TAC AAA AGT TCA TAC AAA ACG CCA AAA GGA GAA ACA 94 Met Ala Gly Phe Tyr Lys Ser Ser Tyr Lys Thr Pro Lys Gly Glu Thr 25 AGA TGG TTG GCT ACA ACC CAG TTT CAG GCA ACT CAT GCC CGT AGT GCT 142 Arg Trp Leu Ala Thr Thr Gin Phe Gin Ala Thr His Ala Arg Ser Ala 35 40 TTC CCT TGT TTC GAT GAA CCA GCA ATG AAG GCC CAT TTC GAA ATC AGC 190 Phe Pro Cys Phe Asp Glu Pro Ala Met Lys Ala His Phe Glu Ile Ser 55 CTT ATA CAC CAT GAA AAA TTG AAA GCA ATT TCC AAT ATG GGT GTA GCA 238 Leu Ile His His Glu Lys Leu Lys Ala Ile Ser Asn Met Gly Val Ala 70 AAG GAA GAA AAC TTA GAT AAC AAC CGA AAA AGA ACA ACA TTC GAA CAA 286 Lys Glu Glu Asn Leu Asp Asn Asn Arg Lys Arg Thr Thr Phe Glu Gin 85 90 TCA GTT CTC ATG TCT CCA TAC CTG GTG GCG TTT ATT ATC TCA GAT TTC 334 Ser Val Leu Met Ser Pro Tyr Leu Val Ala Phe Ile Ile Ser Asp Phe 100 105 110 GAA TAT GTA GAA AAA ATT TCA GGA CCA GTG AAA TAC AGA ATA TAT ACT 382 Glu Tyr Val Glu Lys Ile Ser Gly Pro Val Lys Tyr Arg Ile Tyr Thr 115 120 125 GAT CCT TTC TCG ATT GAT CAA GCT GAC TAT GCA TTG ACT ATG AGC CCC 430 Asp Pro Phe Ser Ile Asp Gin Ala Asp Tyr Ala Leu Thr Met Ser Pro 130 135 140 AAA AAT TTT AAC GGC TTT GGA ACA ACT CAC AGG TGT AAA ATA TGT TTT 478 Lys Asn Phe Asn Gly Phe Gly Thr Thr His Arg Cys Lys Ile Cys Phe 145 150 155 GAA CAA GTT GGA CCA AGC AGC CAT TCC AGA TTT TGC TGC CGG CGC AAT 526 Glu Gin Val Gly Pro Ser Ser His Ser Arg Phe Cys Cys Arg Arg Asn 160 165 170 175 GGA AAA TTT GG 537 Gly Lys Phe INFORMATION FOR SEQ ID NO:113: SEQUENCE CHARACTERISTICS: LENGTH: 178 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:113: WO 97/40058 PCT/US97/06121 250 Tyr Ile Leu Glu Ile Lys Gly Phe Lys Ser Thr Leu Arg Asp Asp Met 1 5- 10 Ala Gly Phe Tyr Lys Ser Ser Tyr Lys Thr Pro Lys Gly Glu Thr Arg 25 Trp Leu Ala Thr Thr Gin Phe Gin Ala Thr His Ala Arg Ser Ala Phe 40 Pro Cys Phe Asp Glu Pro Ala Met Lys Ala His Phe Glu Ile Ser Leu 55 Ile His His Glu Lys Leu Lys Ala Ile Ser Asn Met Gly Val Ala Lys 65 70 75 Glu Glu Asn Leu Asp Asn Asn Arg Lys Arg Thr Thr Phe Glu Gin Ser 90 Val Leu Met Ser Pro Tyr Leu Val Ala Phe Ile Ile Ser Asp Phe Glu 100 105 110 Tyr Val Glu Lys Ile Ser Gly Pro Val Lys Tyr Arg Ile Tyr Thr Asp 115 120 125 Pro Phe Ser Ile Asp Gin Ala Asp Tyr Ala Leu Thr Met Ser Pro Lys 130 135 140 Asn Phe Asn Gly Phe Gly Thr Thr His Arg Cys Lys Ile Cys Phe Glu 145 150 155 160 Gin Val Gly Pro Ser Ser His Ser Arg Phe Cys Cys Arg Arg Asn Gly 165 170 175 Lys Phe INFORMATION FOR SEQ ID NO:114: SEQUENCE CHARACTERISTICS: LENGTH: 806 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..736 OTHER INFORMATION: /note= "At pos. bp 656, change G to S. At pos. aa 219, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:114: C TCG CAG TCT GCA CCC TTG GTG CTA GGC GTT CCT GAC TTT TGG AAC 46 Ser Gin Ser Ala Pro Leu Val Leu Gly Val Pro Asp Phe Trp Asn 1 5 10 AGG TTA GAT GGC AGA ATC GTT GGA GGA CAC GAT ACT AGC ATT GAC AAA 94 Arg Leu Asp Gly Arg Ile Val Gly Gly His Asp Thr Ser Ile Asp Lys 25 CAT CCT CAT CAA GTA TCT TTG ATT TAC ACA AAC CAC AAT TGT GGT GGT 142 WO 97/40058 WO 9740058PCTIUS97/06 12-1.

Hius

TCT

Ser

TCA

Ser

GGT

Gly 251 His Gin Val Ser Leu Ile Tyr Thr Asn His Asn Cys Gly Gly 40 ATT GGC AAA AAC TGG GTT TTA ACA GCA GCT CAT TGG ATC AGC Ile Ala Lys Asn Trp Val Leu Thr Ala Ala His Cys Ile Ser 50 55 TAC TAG AGA GTC GGG GTA GGA AGT TCA ATC AAG AAG AGT GGT Tyr Tyr Arg Val Arg Val Gly Ser Ser Ile Lys Asn Ser Gly GTT GAG AGG GTT AAA AAG CAA ATC AAG CAT CCA AAA TTC GGT Val His Ser Vai Lys Asn Gin Ile Lys His Pro Lys Phe Gly 85 90 GAT TCG GGG ACA CTC GAG TTG GAT TTT GGA CTT GTG GAA TTG GAT GAA Asp Ser Ala Thr Leu Asp Phe Asp Phe Ala Leu Leu Giu Leu Asp Giu 100 105 110 CCA GTT ACA GTA ACA AAA GAG GTC AAC ATG ATG AAA GTA GTA GAC CAA Pro Val Thr Vai Thr Lys Asp Val Asn Ile Ile Lys Leu Val Asp Gin 115 120 125 GAT GTA GAA TTA ACA GGT GGA ACT ATG TGC ACT GTT ACT GGA TGG GGA Asp Val Giu Leu Thr Pro Gly Thr Met Cys Thr Val Thr Gly Trp Giy 130 135 140 TCA ACT GGA TGT GGT GGT GCA ATT AGA AAT GTT GTA GAA GAA GTG GAA Ser Thr Gly Ser Gly Giy Pro Ile Thr Asn Vai Leu Gin Giu Vai Giu 145 150 155 GTT CCA TTT ATG GAG TTG AAG ACC TGG CGA AAA TGC TAG TCA ACC AGG Val Pro Phe Ile Asp Phe Asn Thr Gys Arg Lys Ser Tyr Ser Thr Ser 160 165 170 175 TTA ACC GAG GGT ATG TTG TGC GGT GGA TTT TTG GGA ATT GGT GGT AAG Leu Thr Asp Arg Met Phe Cys Ala Gly Phe Leu Gly Ile Giy Giy Lys 180 185 190 GAG GCT TGT CAA GGT GAG TCT GGT GGC GGA GTT GTT GTC GAT GGT GTT Asp Ala Gys Gin Gly Asp Ser Gly Giy Pro Vai Val Val Asp Gly Val 195 200 205 CTT GAG GGA ATG GTA TGA TGG GGA GGT GGT TGG SGG GTT GGT GAG TAG Leu His Gly Ile Vai Ser Trp Gly Arg Gly Cys Xaa Leu Pro Asp Tyr 210 215 220 CCC GGA GTG TAG TCT P.AG ATC TCA TAT GCG CGT GAG TGG ATT AAG GAA Pro Gly Val Tyr Ser Lys Ile Ser Tyr Ala Arg Asp Trp Ile Lys Giu 225 230 235 AAT GAG TGG TGT TTA ATT TAATATTTAT TATACCAAAT AATTATATAT Asn His Trp Cys Leu Ile 240 245 AAATATATAG TATTTTAAAT ACAAAAAAAA AAAAzpAA 190 238 286 334 382 430 478 526 574 622 670 718 766 WO 97/40058 PTU9/611 PCTfUS97/06111.

252 INFORMATION FOR SEQ ID NO:iiS: SEQUENCE CHARACTERISTICS: LENGTH: 245 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:115: Ser 1 Leu Pro Leu Th r Val Ser Val1 Val Thr 145 Pro Thr Al a His Gi y 225 Gin Ser Ala Pro Leu Val Leu Gly Val Pro Asp Phe Asp His Ile 50 Tyr Val Al a Thr Giu 130 Gi y Phe Asp Cys Gi y 210 Val Gi y Gin Al a Tyr His Thr Vai 115 Leu Ser Ile Arg Gin 195 Ile Tyr 5 Arg Ile Val Vai Ser Leu Lys Asn Trp Arg Val Arg 70 Ser Val Lys Leu Asp Phe 100 Thr Lys Asp Thr Pro Gly Gly Gly Pro 150 Asp Phe Asn 165 Met Phe Cys 180 Gly Asp Ser Val Ser Trp Ser Lys Ile 230 Gi y Ile Vai 55 Val Asn Asp ValI Thr 135 Ile Thr Al a Gi y Gi y 215 Gly Tyr 40 Leu Gly Gin Phe Asn 120 Met Thr Cys Gly Gi y 200 Arg His 25 Thr Thr Ser Ile Al a 105 Ile Cys As n Arg Phe 185 Pro Gi y Thr His Al a Ile 75 His Leu Lys Val Leu 155 Ser Gly Val Xaa Asp 235 Ser Asn His Lys Pro Giu Leu Thr 140 Gin Tyr Ile Val Leu 220 Ile Cys Cys Asn Lys Leu Vai 125 Giy Giu Ser Gi y Asp 205 Pro Trp Asp Gi y Ile Ser Phe Asp 110 Asp T rp Vai Thr Giy 190 Gi y Asp Asn Lys Gi y Ser Gi y Gi y Giu Gin Gi y Giu Ser 175 Lys Val Tyr Arg His Ser Ser Gi y Asp Pro Asp Ser Val 160 Leu Asp Leu Pro Asn 240 Ser Tyr Ala Arg Trp Ile Lys Giu His Trp Cys Leu WO 97/40058 PCTIUS97/06121.

253 INFORMATION FOR SEQ ID NO:116: SEQUENCE CHARACTERISTICS: LENGTH: 157 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:116: GGAATTCGGC ACGAGTATCT GTGCCCTCGC AGTCTGCACC CTTGGCGTAG CGTTCCTGAC TTTTGGAACA GGTTAGATGG CAGAATCGTT GGAGGACACG ATACTAGCAT TGATAACATC 120 CTCATGCAAG TATCTTTGAG TTTACACAAA CCACAAT 157 INFORMATION FOR SEQ ID NO:117: SEQUENCE CHARACTERISTICS: LENGTH: 218 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:117: AAGATGTAGA ATTAACACCT GGAACTATGT GCACTGTTAC TGGATGGGGA TCAACTGGAT CTGGTGGTCC AATTACAAAT GTTCTACAAG AAGTCGAAGT TCCATTTATC GACTTCAACA 120 CCTGCCGAAA ATCCTACTCA ACCAGCTTAA CCGACCGTAT GTTCTGCGCT GGATTTTTGG 180 GAATTGGTGG TAAGGACGCT TGCCAAGGCG ACTCCGGA 218 INFORMATION FOR SEQ ID NO:118: SEQUENCE CHARACTERISTICS: LENGTH: 307 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..306 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:118: CAA ATA TTT AAG CAT GAA CCT GTG AAC GTG GTA TTA TTT CTT GTA GAA 48 Gln Ile Phe Lys His Glu Pro Val Asn Val Val Leu Phe Leu Val Glu 1 5 10 GAT CGC TTC AAT TTT GAA ATT ACT AAC GCG CGC CCT GTT GCC CTA CCA 96 WO 97/40058 WO 7/4058PCTIUS97/06121.

254 ksp Arg Phe Asn Phe Giu Ile Thr Asn Ala Arg Pro Val Ala Leu Pro -25 GCA GAG AAC GAG GAA ACC GAA ACA GGG TCA CCA CTC ACG GTG ACG GGT kla Glu Asn Glu Giu Thr Glu Thr Gly Ser Pro Leu Thr Val Thr Gly 35 40 TGG GGA ACT ACA GAG AGT ACT GAA TCA TCA CAC CAC CTG AAA GAA GTT rrp Gly Thr Thr Glu Ser Thr Giu Ser Ser His His Leu Lys Giu Val 55 GAA GTG AAC GCT GTA TCT AAT AGT GAA TGT CAA AAG GCC TAT GAA GAT Giu Val Asn Ala Val Ser Asn Ser Glu Cys Gin Lys Ala Tyr Giu Asp 70 75 CTT GCT ACT ATA TCA TCA CAT GAG ATA TGT GCA AGC GTT CCT GGT GGC Leu Ala Thr Ile Ser Ser His Giu Ile Cys Ala Ser Vai Pro Gly Gly 90 GGC AAA GAT TCT TGT CAA G Gly Lys Asp Ser Cys Gin 100 INFORMATION FOR SEQ ID NO:1i9: SEQUENCE CHARACTERISTICS: LENGTH: 102 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:119: Gin Ile Phe Lys His Giu Pro Vai Asn Val Val Leu Phe Leu Vai Giu 1 5 10 Asp Arg Phe Asn Phe Glu Ile Thr Asn Ala Arg Pro Val Ala Leu Pro 25 Ala Glu Asn Giu Glu Thr Giu Thr Gly Ser Pro Leu Thr Val Thr Gly 35 40 Trp Giy Thr Thr Glu Ser Thr Giu Ser Ser His His Leu Lys Giu Vai 55 Giu Leu Gl y Val Al a Lys Ala Ile Ser 100 Ser 70 Ser Gin Ser Giu Cys Cys Lys Ala Ser Val Giu Gi y Asp Gl y WO '97/40058 PCT/US97/06121 255 INFORMATION FOR SEQ ID NO:120: SEQUENCE CHARACTERISTICS: LENGTH: 162 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:120: GGTTGGGGAA CTACAGAGAG TACTGAATCA TCACACCACC TGAAAGAAGT TGAAGTGAAC GCTGTATCTA ATAGTGAATG TCAAAGGCCT AATGAAGATC TTGCTACTAT ATCATCACAT 120 GAGATATGTG CAAGCGTTCC TGGTGGCGGC AAAGATTCTT GT 162 INFORMATION FOR SEQ ID NO:121: SEQUENCE CHARACTERISTICS: LENGTH: 252 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:121: CGCGCGCCCT GTTGCCCTAC CAGCAGAGAA CGAGGAAACC GAAACAGGGT CACACTCACG GTGACGGGTT GGGGAACTAC AGAGAGTACT GAATCATCAC ACCACCTGAA AGAAGTTGAA 120 GTGAACGCTG TATCTAATAG TGAATGTCAA AGGCCTAATG AAGATCTTGC TACTATATCA 180 TCACATGAGA TATGTGCAAG CGTTCCTGGT GGCGGCAAAG ATTCTTGTCA AGGAGACTCT 240 GGTGGTCCTT TA 252 INFORMATION FOR SEQ ID NO:122: SEQUENCE CHARACTERISTICS: LENGTH: 267 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..267 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:122: CCG ACT ACC TTA GGC GGA TCT ACT GGT CAC AGC AAT GAT ATT GCC CTA 48 WO 97/40058 Pro 1

ATC

Ile

AAA

Lys

GGA

Gly

CAA

Gin

ACA

Thr PCTIUS97/06121 256 Thr Thr Leu Gly Gly Ser Thr Gly His Ser Asn Asp Ile A-la Leu 5 10 AAA GTC GAT AGA GAT ATC AAA TTC AGC AAA ACT GTC CAA CCT ATC Lys Vai Asp Arg Asp Ile Lys Phe Ser Lys Thr Val Gin Pro Ile 20 25 TTG CAC A~AA AGT TTA ATA AAT GGA. GGT GAA AAA TTG AAA ATT ACT Leu His Lys Ser Leu Ile Asn Gly Gly Giu Lys Leu Lys Ile Thr 40 TGG GGA TTG ACG AAT CAA AGT CAT AGT GAT GAA CCA GAT GTT CTT Trp Gly Leu Thr Asn Gin Ser His Ser Asp Giu Pro Asp Vai Leu 55 GAG TTG CAT GTA AAA GCA CTT ACT GAT TCT GAG TGC GAG AAA GCT Giu Leu His Val Lys Ala Leu Thr Asp Ser Giu Cys Giu Lys Ala 70 75 GGT GAA GAC CAT CCT ACA CAC CTT Giy Giu Asp His Pro Thr His Leu 96 144 192 2 2 67 INFORMATION FOR SEQ ID NO:123: SEQUENCE CHARACTERISTICS: LENGTH: 89 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein Pro 1 Ile Lys Gi y Gin Thr (xi) SEQUENCE Thr Thr Leu Giy 5 Lys Vai Asp Arg Leu His Lys Ser 35 Trp Gly Leu Thr Giu Leu His Val Giy Giu Asp His DESCRIPTION: SEQ ID Gly Ser Thr Gly His 10 Asp Ile Lys Phe Ser 25 Leu Ile Asn Gly Gly 40 Asn Gin Ser His Ser 55 Lys Ala Leu Thr Asp 7 0 Pro Thr His Leu NO: 123: Ser Asn Asp Ile Ala Lys Thr Val Gin Pro Giu Lys Leu Lys Ile Asp Giu Pro Asp Val Ser Giu Cys Giu Lys 75 Leu Ile Thr Leu Al a WO 97/40058 PCT/US97/06121, 257 INFORMATION FOR SEQ ID NO:124: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "probe" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:124: TAAWGGWCCW CCYGAATCTC CCTGGCA 27 INFORMATION FOR SEQ ID NO:125: SEQUENCE CHARACTERISTICS: LENGTH: 436 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..436 OTHER INFORMATION: /note= "At pos. bp 301, change A to W. At pos. bp 342, change C to Y. At pos. bp 431, change G to S. At pos. aa 100, 114 and 144, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:125: A GTT AGT CTT TCC AAT TCG ATC AGA CCT TCT TGT TTA TGG GCC AAT 46 Val Ser Leu Ser Asn Ser Ile Arg Pro Ser Cys Leu Trp Ala Asn 1 5 10 GAC GAG TTC GAC ACA GAT AGT TCA ATT GCT ACT GGT TGG GGA AAG ATA 94 Asp Glu Phe Asp Thr Asp Ser Ser Ile Ala Thr Gly Trp Gly Lys Ile 20 25 GAC TAT GCT GAG AGC AGA AGT GAT GAC CTA CTG AAA GTA GTA CTG AAA 142 Asp Tyr Ala Glu Ser Arg Ser Asp Asp Leu Leu Lys Val Val Leu Lys 40 ATT ATT GAT AAT AGG CAA TGC GCT CCC TTA TAC GTT GAT CAG ATT AAT 190 Ile Ile Asp Asn Arg Gin Cys Ala Pro Leu Tyr Val Asp Gin Ile Asn 55 AGA AGA AGA TTG AGA AAT GGA ATT GTA GAT ACA CAG ATG TGT GCA GGA 238 Arg Arg Arg Leu Arg Asn Gly Ile Val Asp Thr Gin Met Cys Ala Gly 70 GAA TTG GAT GGT GGC AAA GAC ACT TGC CAG GGA GAT TCA GGT GGG CCA 286 Glu Leu Asp Gly Gly Lys Asp Thr Cys Gin Gly Asp Ser Gly Gly Pro 85 90 TTG CAA ATA ACA AAW CAA AGC AAC AAA TGT ATC TTC TAC ATA GTG GGA WO 97/40058 WO 9740058PCTfUS97/06121.

258 Leu Gin Ile Thr Xaa Gin Ser Asn Lys Cys 100- 105 1TA ACA TYA TTC GGA AGG GGA TGT GGT GCT CC Ile Thr Xaa Phe Gly Arg Gly Cys Gly Ala 115 120 TAT ACT AGA GTC AGY AAG TAT GTT GAC TGG AT Tyr Thr Arg Val Ser Lys Tyr Val Asp Trp 130 135 SCA AAT Xaa Asn 145 INFORMATION FOR SEQ ID NO:126: SEQUENCE CHARACTERISTICS: LENGTH: 145 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein Ile Phe Tyr Ile Val Gly 110 :T AAT AGC CCC GGT GTT Pro Asn Ser Pro Gly Vai 125 'T GAA AGT GTT GTT TGG Ile Glu Ser Val Val Trp, 140 382 430 Val 1 Gi u Tyr Ile Arg Leu Gin Thr Thr Asn 145 (xi) SEQUENCE Ser Leu Ser Asn 5 Phe Asp Thr Asp Ala Giu Ser Arg Asp Asn Arg Gin Arg Leu Arg Asn Asp Gly Giy Lys 85 Ile Thr Xaa Gin 100 Xaa Phe Gly Arg 115 Arg Val Ser Lys 130 DESCRIPTION: SEQ ID Ser Ile Arg Pro Ser 10 Ser Ser Ile Ala Thr 25 Ser Asp Asp Leu Leu 40 Cys Ala Pro Leu Tyr 55 Gly Ile Val Asp Thr 70 Asp Thr Cys Gin Giy 90 Ser Asn Lys Cys Ile 105 Gly Cys Gly Ala Pro 120 Tyr Val Asp Trp Ile 135 NO: 126: Cys Leu Gly Trp Lys Val Val Asp Gin Met 75 Asp Ser Phe Tyr Asn Ser Glu Ser 140 T rp Gi y Val Gin Cys Gi y Ile Pro 125 Val Al a Lys Leu Ile Al a Gi y Val 110 Gi y Val Asn Ile Lys Asn Gi y Pro Gi y Val T rp Asp Asp Ile Arg Giu Leu Ile Tyr Xaa WO 97/40058 PCT/US97/06121.

259 INFORMATION FOR SEQ ID NO:127: SEQUENCE CHARACTERISTICS: LENGTH: 299 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:127: AATTCGGCAC GAGAGTTAGT CTTTCCAATT CGATCAGACC TTCTTGTTTA TGGGCCAATG ACGAGTTCGA CACAGATAGT TCAATTGCTA CTGGTTGGGG AAAGATAGAC TATGCTGAGA 120 GCAGAAGTGA TGACCTACTG AAAGTAGTAC TGAAAATTAT TGATAATAGG CAATGCGVYC 180 CCTTATACGT TGATCAGATT AATAGAAGAA GATTGAGAAA TGGAATTGTA GATACACAGA 240 TGTGTGCAGG AGAATTGGAT GGTGGCAAAG ACACTTGCCA GGGAGATTCA GGTGGTCCT 299 INFORMATION FOR SEQ ID NO:128: SEQUENCE CHARACTERISTICS: LENGTH: 758 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..738 OTHER INFORMATION: /note= "At pos. aa 241, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:128: AAG GTA CTG ATC GTT TTA GCA GTC ATT GAA TTC GCA TCA GCG TCT TCA 48 Lys Val Leu Ile Val Leu Ala Val Ile Glu Phe Ala Ser Ala Ser Ser 1 5 10 ATC GGC TGG AGA ATC GTG GGT GGT GAA AAT GCT AAA GAA AAA TCG GTG 96 Ile Gly Trp Arg Ile Val Gly Gly Glu Asn Ala Lys Glu Lys Ser Val 25 CCC TAT CAA GTT TCA CTT CGA AAT GCT GAA AAC AAA CAT TTC TGT GGA 144 Pro Tyr Gin Val Ser Leu Arg Asn Ala Glu Asn Lys His Phe Cys Gly 35 40 GGA GCA ATT ATT GAC GAT TAT TGG GTT TTG ACT GCT GCT CAT TGC ATG 192 Gly Ala Ile Ile Asp Asp Tyr Trp Val Leu Thr Ala Ala His Cys Met 55 GGA CAA CGT TTT GAA GTC GTT GCC GGC GTG AAT AAA CTG GAT GAA GTA 240 Gly Gin Arg Phe Glu Val Val Ala Gly Val Asn Lys Leu Asp Glu Val 70 75 GGT GAA CGA TAT AGA ATA GAA AAA ACT ATT ACT GAC AAG TTT GAT GAA WO 97/40058 PCTUS97/06121.

260 Gly Glu Arg Tyr Arg Ile Glu Lys Thr Ile Thr Asp Lys Phe Asp Glu 90 CAA ACT GCC GCA AAT GAT TTG GCA CTT GTA AAA CTT CGG AAT AAA ATA 336 Gin Thr Ala Ala Asn Asp Leu Ala Leu Val Lys Leu Arg Asn Lys Ile 100 105 110 AAA TTC AGC GAT AAA GTG CAA AAA ATT CAA TTT GAA GAT AAA TAT ATC 384 Lys Phe Ser Asp Lys Val Gin Lys Ile Gin Phe Glu Asp Lys Tyr Ile 115 120 125 GGA GGC GGA GAG GAT GCT CGT TTG ACT GGA TGG GGA CGA TTG GGA AAA 432 Gly Gly Gly Glu Asp Ala Arg Leu Thr Gly Trp Gly Arg Leu Gly Lys 130 135 140 GAT TCA CCG CCA CCT AAT GAT TTA CAG GAA TTA AAT ACA TTT ACC ATC 480 Asp Ser Pro Pro Pro Asn Asp Leu Gin Glu Leu Asn Thr Phe Thr Ile 145 150 155 160 CCC CAA AGT GTT TGC AGA AGA ATG TTT AAT GAG GAT AAG ATT CCA ATC 528 Pro Gin Ser Val Cys Arg Arg Met Phe Asn Glu Asp Lys Ile Pro Ile 165 170 175 CAC GAT AGC CAA ATA TGC ACT TTT GCA GAT ATG GGC AAG GGT GCA TGT 576 His Asp Ser Gin Ile Cys Thr Phe Ala Asp Met Gly Lys Gly Ala Cys 180 185 190 AAG GGT GAT TCC GGT GGC CCC TTA GTC ATC AAT GGA CAA CTT CAT GGA 624 Lys Gly Asp Ser Gly Gly Pro Leu Val Ile Asn Gly Gin Leu His Gly 195 200 205 ATT GTT TCC TGG GGC ATT CCT TGC GCT GTC GGC AAG CCT GAT GTA TTC 672 Ile val Ser Trp Gly Ile Pro Cys Ala Val Gly Lys Pro Asp Val Phe 210 215 220 ACA AGA GTT TCT CAT TAT GTC GAT TGG ATT AAA TCC AAA ATT GCC AAA 720 Thr Arg Val Ser His Tyr Val Asp Trp Ile Lys Ser Lys Ile Ala Lys 225 230 235 240 TAA AAT TGT TTA GTG TAT TAAAAAAAAA AAAAAAAAAA 758 Xaa Asn Cys Leu Val Tyr 245 INFORMATION FOR SEQ ID NO:129: SEQUENCE CHARACTERISTICS: LENGTH: 246 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:129: Lys Val Leu Ile Val Leu Ala Val Ile Glu Phe Ala Ser Ala Ser Ser 1 5 10 Ile Gly Trp Arg Ile Val Gly Gly Glu Asn Ala Lys Glu Lys Ser Val 25 WO 97/40058 PCT/US97/06121.

261 Pro Tyr Gin Val Ser Leu Arg Asn Ala Glu Asn Lys His Phe Cys Gly -40 Gly Ala Ile Ile Asp Asp Tyr Trp Val Leu Thr Ala Ala His Cys Met 55 Gly Gin Arg Phe Glu Val Val Ala Gly Val Asn Lys Leu Asp Glu Val 70 75 Gly Glu Arg Tyr Arg Ile Glu Lys Thr Ile Thr Asp Lys Phe Asp Glu 90 Gin Thr Ala Ala Asn Asp Leu Ala Leu Val Lys Leu Arg Asn Lys Ile 100 105 110 Lys Phe Ser Asp Lys Val Gin Lys Ile Gin Phe Glu Asp Lys Tyr Ile 115 120 125 Gly Gly Gly Glu Asp Ala Arg Leu Thr Gly Trp Gly Arg Leu Gly Lys 130 135 140 Asp Ser Pro Pro Pro Asn Asp Leu Gin Glu Leu Asn Thr Phe Thr Ile 145 150 155 160 Pro Gin Ser Val Cys Arg Arg Met Phe Asn Glu Asp Lys Ile Pro Ile 165 170 175 His Asp Ser Gin Ile Cys Thr Phe Ala Asp Met Gly Lys Gly Ala Cys 180 185 190 Lys Gly Asp Ser Gly Gly Pro Leu Val Ile Asn Gly Gin Leu His Gly 195 200 205 Ile Val Ser Trp Gly Ile Pro Cys Ala Val Gly Lys Pro Asp Val Phe 210 215 220 Thr Arg Val Ser His Tyr Val Asp Trp Ile Lys Ser Lys Ile Ala Lys 225 230 235 240 Xaa Asn Cys Leu Val Tyr 245 INFORMATION FOR SEQ ID NO:130: SEQUENCE CHARACTERISTICS: LENGTH: 144 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:130: GTACTGATCG TTTTAGCAGT CATTGAATTC GCATCAGCGT CTTCAATCGG CTGGAGAATC GTGGGTGGTG AAAATGCTAA AGAAAAATCG GTGCCCTATC AAGTTTCMCT TCGAAATGCT 120 GAAAACAAAC ATTTYTGTGG RGGR 144 W0 97/40058 PCT/US97/06121.

262 INFORMATION FOR SEQ ID NO:131: SEQUENCE CHARACTERISTICS: LENGTH: 225 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:131: CCAATCCACG ATAGCCAATA TGCACTTTTG CAGATATGGG TCAAGGGTGC ATGTAAGGGT GATTCCGGTG GCCCCTTAGT CATCAATGGA CAACTTCATG GAATTGTTTC CTGGGGCATT 120 CCTTGCGCTG TCGCAAGCCT GATGTATTCA CAAGAGTTTC TCATTATGTC GATTGGATTA 180 AATCCAAAAT TGCCAAATAA AATTGTTTAG AGTATTAAAA AAAAA 225 INFORMATION FOR SEQ ID NO:132: SEQUENCE CHARACTERISTICS: LENGTH: 610 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..555 OTHER INFORMATION: /note= "At pos. bp 133, change A to H. At pos. aa 45, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:132: GTT TTG ACA GCT GCT CAT TGT GTA GAA AAG TAT AAA TTG ACT GTT AGA 48 Val Leu Thr Ala Ala His Cys Val Glu Lys Tyr Lys Leu Thr Val Arg 1 5 10 GTT GGC AGC AGC GAT TTG GAA TCA GGA GGC AAA ATA CAT ACC ATT AAA 96 Val Gly Ser Ser Asp Leu Glu Ser Gly Gly Lys Ile His Thr Ile Lys 25 AAA ATC CAT GTT CAT CCA TAT TAC GAA CCG GTT GAC HAC GAT TTT GCC 144 Lys Ile His Val His Pro Tyr Tyr Glu Pro Val Asp Xaa Asp Phe Ala 40 TTG CTG GGA CTT GAT GAA CCA GTW TTC TTG AGC AAC AAA GTT CAA CTT 192 Leu Leu Gly Leu Asp Glu Pro Val Phe Leu Ser Asn Lys Val Gin Leu 55 GTG AAA CTT GTA GAA CAA GGT GTA GAT CTG GAT GAA GGA ACC TTT CTT 240 Val Lys Leu Val Glu Gin Gly Val Asp Leu Asp Glu Gly Thr Phe Leu 65. 70 75 AAT GCC ACT GGA TGG GGT ACA ACA GCG ACC GAA GAT TTG GCT CCA GTT 288 Asn Ala Thr Gly Trp Gly Thr Thr Ala Thr Glu Asp Leu Ala Pro Val 90 WO 97/40058 WO 9740058PCTIUS97/06 121.- 263 CTT CAA TTA GTA ACA GTT CCA GTA GTC AAC ACA TAT ACT TGC AGC AAA Leu Gin Leu Val Thr Val Pro Val Val Asn Thr Tyr Thr Cys Ser Lys 100 105 110 ATT TAC GAO TTT GGT ATC ACA CAA AGA ATG TTT TGC GCT GGT TAT ATG Ile Tyr Asp Phe Giy Ile Thr Gin Arg Met Phe Cys Aia Giy Tyr Met 115 120 125 GAT GGA ACT CTT AAG GAO ATC TGC TCT GGA GAT TCA GGT AGT CCT GTG Asp Giy Thr Leu Lys Asp Ile Cys Ser Gly Asp Ser Giy Ser Pro Vai 130 135 140 GTG AAG GAT GGT ATC CAA TAT GGT GTG GTG TCT TGG GGA AAA GCT TGT Vai Lys Asp Giy Ile Gin Tyr Giy Vai Val Ser Trp Gly Lys Ala Cys 145 150 155 160 GCC GAT CCA AGA TAT COA AAT GTT TAT TCC AAA GTT AGC TAC GAA CGT Aia Asp Pro Arg Tyr Pro Asn Val Tyr Ser Lys Vai Ser Tyr Giu Arg 165 170 175 ATA TGG ATT AAA GAA GTG TOT GGA GTT TAAGTAGATG CATTOTTTAT Ile Trp Ile Lys Giu Vai Ser Giy Val 180 185 TATAATAAAA TTGTTTAA.TT AAAAYAAAAA AAAAA INFORMATION FOR SEQ ID NO:133: SEQUENCE CHARACTERISTICS: LENGTH: 185 amino acids TYPE: amino acid TOPOLOGY: linear 336 384 432 480 528 575 610 (ii) MOLECULE TYPE: protein Val 1 Vai Lys Leu Val As n Leu Ile (xi) SEQUENCE Leu Thr Ala Ala 5 Gly Ser Ser Asp 20 Ile His Vai His Leu Gly Leu Asp Lys Leu Val Giu Aia Thr Giy Trp Gin Leu Val Thr 100 Tyr Asp Phe Gly 115 DESCRIPTION: SEQ ID His Cys Val Glu Lys 10 Leu Giu Ser Gly Gly 25 Pro Tyr Tyr Giu Pro 40 NO: 13 3: Tyr Lys Lys Ile Val Asp Gi u Gin 70 Gi y Vai Ile Pro 55 Gi y Thr Pro Thr Phe Asp Al a Val 105 Arg Asn Giu Asp Tyr Cys Leu His Xaa Lys Gly Leu Th r Ala 125 Thr Thr Asp Val Thr Al a Cys 110 Gly Val Ile Phe Gin Phe Pro Ser Tyr Arg Lys Ala Leu Leu Vai Lys Met WO 97/40(058 PCT/US97/06121.

264 Asp Gly Thr Leu Lys Asp Ile Cys Ser Gly Asp Ser Gly Ser Pro Val 130 135 140 Val Lys Asp Gly Ile Gin Tyr Gly Val Val Ser Trp Gly Lys Ala Cys 145 150 155 160 Ala Asp Pro Arg Tyr Pro Asn Val Tyr Ser Lys Val Ser Tyr Glu Arg 165 170 175 Ile Trp Ile Lys Glu Val Ser Gly Val 180 185 INFORMATION FOR SEQ ID NO:134: SEQUENCE CHARACTERISTICS: LENGTH: 386 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..384 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:134: AAR ATT GGT ACT AGC CAT AGG ATA CTT TAT GGT AGA GTG ATA GAC ATA Lys Ile Gly Thr Ser His Arg Ile Leu Tyr Gly Arg Val Ile Asp Ile 1 5 10 AAA GAA ATT ATA ATG CAT CCA GAC TAT ACG TCT GTA TCA GGA AGT GGA Lys Glu Ile Ile Met His Pro Asp Tyr Thr Ser Val Ser Gly Ser Gly 20 25 TAC GAT GTA GCA CTA TTG AAA CCG TCT ACA AAA ATT GTT TTT AAC TCA Tyr Asp Val Ala Leu Leu Lys Pro Ser Thr Lys Ile Val Phe Asn Ser 40 AAA TCT ATC AAA CCT GTA AAG CTA ATT GAT GAA GGA ATC GAA ACG GCT Lys Ser Ile Lys Pro Val Lys Leu Ile Asp Glu Gly Ile Glu Thr Ala 55 AAT GGT TCA ATA GCA ACC GTG GCA GGT TGG GGT AAA GTA GTG GAT GGT Asn Gly Ser Ile Ala Thr Val Ala Gly Trp Gly Lys Val Val Asp Gly 70 75 TTT CCG TAC ATA CCT AAT TAT TTA TTG GCT GTA AAT GTA CCG ATT ATT Phe Pro Tyr Ile Pro Asn Tyr Leu Leu Ala Val Asn Val Pro Ile Ile 90 GAT AGC GAT ACA TGC AAG TCG ATG AAT ATT GAA TAT CAA AAA TAT TTG Asp Ser Asp Thr Cys Lys Ser Met Asn Ile Glu Tyr Gin Lys Tyr Leu 100 105 110 AAA CCA AAT ATG ATA TGC GCC GGA TAT GCA AAA GGT GGT AAA GAT TCT Lys Pro Asn Met Ile Cys Ala Gly Tyr Ala Lys Gly Gly Lys Asp Ser 115 120 125

TG

48 96 144 192 240 288 336 384 WO 97/40058 PCT/LVS97/06121.

265 INFORMATION FOR SEQ ID NO:135: SEQUENCE CHARACTERISTICS: LENGTH: 128 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:135: Lys Ile Gly Thr Ser His Arg Ile Leu Tyr Gly Arg Val Ile Asp Ile 1 5 10 Lys Glu Ile Ile Met His Pro Asp Tyr Thr Ser Val Ser Gly Ser Gly 25 Tyr Asp Val Ala Leu Leu Lys Pro Ser Thr Lys Ile Val Phe Asn Ser 40 Lys Ser Ile Lys Pro Val Lys Leu Ile Asp Glu Gly Ile Glu Thr Ala 50 55 Asn Gly Ser Ile Ala Thr Val Ala Gly Trp Gly Lys Val Val Asp Gly 70 75 Phe Pro Tyr Ile Pro Asn Tyr Leu Leu Ala Val Asn Val Pro Ile Ile 90 Asp Ser Asp Thr Cys Lys Ser Met Asn Ile Glu Tyr Gin Lys Tyr Leu 100 105 110 Lys Pro Asn Met Ile Cys Ala Gly Tyr Ala Lys Gly Gly Lys Asp Ser 115 120 125 INFORMATION FOR SEQ ID NO:136: SEQUENCE

CHARACTERISTICS:

LENGTH: 423 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY:

CDS

LOCATION: 1..423 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:136: GTA GTT ACA GCT GCA CAT TGT GTC ACC GTT GGA GGA CAT AAC CAA GTC 48 Val Val Thr Ala Ala His Cys Val Thr Val Gly Gly His Asn Gln Val 1 5 10 GTA GCT GTT GTA GGA ACC AAC AAA TTG AGC TCC GGA GGC ACC ACA TAC 96 WO 97/40058 PCTIUS97/06121 Val

AAA

Lys

GAC

Asp

GAC

Asp

GAA

Glu

GTT

Val 266 Ala Val Val Gly Thr Asn Lys Leu Ser Ser Gly Gly Thr Thr Tyr -25 GCT GAA CGT GTT GTT GTA CAC GAA CGT TAT GGC AAT GCT GAT ATT Ala Glu Arg Val Val Val His Glu Arg *Tyr Gly Asn Ala Asp Ile 35 40 AAC GAT CTT GCC TTG ATC AAG TTG ACC CAA GAT GTC GTA TTC ACT Asn Asp Leu Ala Leu Ile Lys Leu Thr Gin Asp Val Val Phe Thr 55 CGC GTA CAG CCC GTC ACC GTA TCC AGA ACT ACA GTC AAA GGA GGA Arg Val Gin Pro Val Thr Val Ser Arg Thr Thr Val Lys Gly Gly 70 75 ACC TTG AGA ATC ACT GGA TGG GGT TAC ACC AAC CAC GGT GGC CCA Thr Leu Arg Ile Thr Gly Trp Gly Tyr Thr Asn His Gly Gly Pro 90 CTG CCC GAC AGT TTG CAA GAA CTT CAT GTA ACC GCC CAG ACC CCA Leu Pro Asp Ser Leu Gin Glu Leu His Val Thr Ala Gin Thr Pro 100 105 110 144 192 240 288 336 384 423 AGC ACA TGC CAA AAA TAC ACA CCA GCC GCC ACA CAA TTG TGC ACT TTC Ser Thr Cys Gin Lys Tyr Thr Pro Ala Ala Thr Gin Leu Cys Thr Phe 115 120 125 TTG AAG ACT GGA CAA GGA GTT TGC AAC GGT GAT TCT GGA Leu Lys Thr Gly Gin Gly Val Cys Asn Gly Asp Ser Gly 130 135 140 INFORMATION FOR SEQ ID NO:137: SEQUENCE CHARACTERISTICS: LENGTH: 141 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:137: Val Val Thr Ala Ala His Cys Val Thr Val Gly Gly His Asn Gin Val 1 5 10 Val Ala Val Val Gly Thr Asn Lys Leu Ser Ser Gly Gly Thr Thr Tyr 25 Lys Ala Glu Arg Val Val Val His Glu Arg Tyr Gly Asn Ala Asp Ile 40 Asp Asn Asp Leu Ala Leu Ile Lys Leu Thr Gin Asp Val Val Phe Thr 55 Asp Arg Val Gin Pro Val Thr Val Ser Arg Thr Thr Val Lys Gly Gly 65 70 75 WO 97/40058 PCT/US97/06121.

267 Glu Thr Leu Arg Ile Thr Gly Trp Gly Tyr Thr Asn His Gly Gly Pro -90 Val Leu Pro Asp Ser Leu Gin Glu Leu His Val Thr Ala Gin Thr Pro 100 105 110 Ser Thr Cys Gin Lys Tyr Thr Pro Ala Ala Thr Gin Leu Cys Thr Phe 115 120 125 Leu Lys Thr Gly Gin Gly Val Cys Asn Gly Asp Ser Gly 130 135 140 INFORMATION FOR SEQ ID NO:138: SEQUENCE CHARACTERISTICS: LENGTH: 197 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..197 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:138: GT GGT GGG CTG CCT TCT TCA ATA ACA TTA GCG CGA GTC AGA CTC GGC 47 Gly Gly Leu Pro Ser Ser Ile Thr Leu Ala Arg Val Arg Leu Gly 1 5 10 GAA CAT AAT GAT CAA TCG GGT ATA GAT TGC GAG GAC GAT GTT TGC GCA Glu His Asn Asp Gin Ser Gly Ile Asp Cys Glu Asp Asp Val Cys Ala 20 25 GAA CCT GTC CAA GAT TTC GAT CCT GTG AAA ATA ATT CCA CAT CCA GAA 143 Glu Pro Val Gin Asp Phe Asp Pro Val Lys Ile Ile Pro His Pro Glu 40 TAC AAA GAC GAA CTA TTT AAA CAT GAT ATA GCT CTG ATA AAA TTG GTA 191 Tyr Lys Asp Glu Leu Phe Lys His Asp Ile Ala Leu Ile Lys Leu Val 55 GAA AAT 197 Glu Asn INFORMATION FOR SEQ ID NO:139: SEQUENCE CHARACTERISTICS: LENGTH: 65 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:139: Gly Gly Leu Pro Ser Ser Ile Thr Leu Ala Arg Val Arg Leu Gly Glu 1 5 10 WO 97/40058 PCT/US97/06121.

268 His Asn Asp Gin Ser Gly Ile Asp Cys Glu Asp Asp Val Cys Ala Glu -25 Pro Val Gin Asp Phe Asp Pro Val Lys Ile Ile Pro His Pro Glu Tyr 40 Lys Asp Glu Leu Phe Lys His Asp Ile Ala Leu Ile Lys Leu Val Glu 55 Asn INFORMATION FOR SEQ ID NO:140: SEQUENCE CHARACTERISTICS: LENGTH: 341 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..341 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:140: GA AAA TCT CGA ATT GGG CGA CAG CTT AAA ATT CAC TAT GGA AAT AAC 47 Lys Ser Arg Ile Gly Arg Gin Leu Lys Ile His Tyr Gly Asn Asn 1 5 10 GAC TGG CAC TTT GGC TTC GTT AGT ATT GTA AAA AAG GCT ATT ATT CAT Asp Trp His Phe Gly Phe Val Ser Ile Val Lys Lys Ala Ile Ile His 20 25 CCA AAT TAC AAC CCA GTG ACA TTT GAT AGT GAT GTG GCC CTT CTG AAG 143 Pro Asn Tyr Asn Pro Val Thr Phe Asp Ser Asp Val Ala Leu Leu Lys 40 CTG CAC TCT CCA ATT ACC TTC ACA AAT GGC GTT CAT AAA GTG TCG CTG 191 Leu His Ser Pro Ile Thr Phe Thr Asn Gly Val His Lys Val Ser Leu 55 GTC GAA AAA GGT CAA GAT CCT GTA CCT TAT TCA CCT GCG ATG ATC ACT 239 Val Glu Lys Gly Gin Asp Pro Val Pro Tyr Ser Pro Ala Met Ile Thr 70 GGC TGG GGC CAT ACA ATG GAA GGT GAT ACT AGT ATT TCG CAA ATT TTA 287 Gly Trp Gly His Thr Met Glu Gly Asp Thr Ser Ile Ser Gin Ile Leu 85 90 CAA GGA GCT GTG GTC CCA ATC GTA AAC AGA AAT GAT TGT CCG AAT TAT 335 Gln Gly Ala Val Val Pro Ile Val Asn Arg Asn Asp Cys Pro Asn Tyr 100 105 110 GGA CTC 341 Gly Leu WO 97/40058 PCT/US97/06121.

269 INFORMATION FOR SEQ ID NO:141: SEQUENCE CHARACTERISTICS: LENGTH: 113 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Lys Ser Arg Ile Gly Arg Gin Leu Lys Ile 1 5 10 Trp His Phe Gly Phe Val Ser Ile Val Lys 25 Asn Tyr Asn Pro Val Thr Phe Asp Ser Asp His Ser Pro Ile Thr Phe Thr Asn Gly Val 50 55 Glu Lys Gly Gin Asp Pro Val Pro Tyr Ser 70 Trp Gly His Thr Met Glu Gly Asp Thr Ser 90 Gly Ala Val Val Pro Ile Val Asn Arg Asn 100 105 NO:141: His Tyr Lys Ala Val Ala His Lys Pro Ala 75 Ile Ser Asp Cys Asn Ile Leu Ser Ile Ile Asn 110 Asn His Lys Leu Thr Leu Tyr Asp Pro Leu Val Gly Gin Gly Leu (2)

GTA

Val 1

CTT

Leu

TAT

Tyr INFORMATION FOR SEQ ID NO:142: SEQUENCE CHARACTERISTICS: LENGTH: 390 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..390 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:142: TAT GGC AGA GCA ACT CCA TCA CTG TTC ACC GTT GTA TCA GGC GCA Tyr Gly Arg Ala Thr Pro Ser Leu Phe Thr Val Val Ser Gly Ala 5 10 TAC TTA ACT GAA GGT GGA GAA CAT CAT GCT GTT GCT TCC ATC AAA Tyr Leu Thr Glu Gly Gly Glu His His Ala Val Ala Ser Ile Lys 20 25 CAC GAA AAG TAC AGC CCA AAC ACT TTG GAC AAT GAT GTG GCA GTT His Glu Lys Tyr Ser Pro Asn Thr Leu Asp Asn Asp Val Ala Val 40 48 96 144 WO 07/40058 PCT/US97/06121 270 TTG AAG TTG AAA CAG CCA TTG ACT TTC AAT GCT AAC CAG AAA CCT GTC Leu Lys Leu Lys Gin Pro -Leu Thr Phe Asn Ala Asn Gin Lys Pro Val 55 GCC TTG GCC TCA AAG GAT ACA CCT GGA GAC CTC AAA TGC AAA TTC TCT Ala Leu Ala Ser Lys Asp Thr Pro Gly Asp Leu Lys Cys Lys Phe Ser 70 75 GGT TGG GGA TTA GAC GCA TAT CCA AGT GAT GTT TTA CCA AAT CAC TTA Gly Trp Gly Leu Asp Ala Tyr Pro Ser Asp Val Leu Pro Asn His Leu 90 CAA AAA ATG GAT GTT CTG ACC TAC AAT AAC GCT GAC TGC CAA AAG TTC Gin Lys Met Asp Val Leu Thr Tyr Asn Asn Ala Asp Cys Gin Lys Phe 100 105 110 CAT AAT GCT GGA CCT AAA TCT AAC ACA ATC TAC CCA GGA ATG TTG TGC His Asn Ala Gly Pro Lys Ser Asn Thr Ile Tyr Pro Gly Met Leu Cys 115 120 125 GGA TTC Gly Phe 130 INFORMATION FOR SEQ ID NO:143: SEQUENCE CHARACTERISTICS: LENGTH: 130 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:143: Val Tyr Gly Arg Ala Thr Pro Ser Leu Phe Thr Val Val Ser Gly Ala 1 5 10 Leu Tyr Leu Thr Glu Gly Gly Glu His His Ala Val Ala Ser Ile Lys 25 Tyr His Glu Lys Tyr Ser Pro Asn Thr Leu Asp Asn Asp Val Ala Val 40 Leu Lys Leu Lys Gin Pro Leu Thr Phe Asn Ala Asn Gin Lys Pro Val 55 Ala Leu Ala Ser Lys Asp Thr Pro Gly Asp Leu Lys Cys Lys Phe Ser 65 70 75 Gly Trp Gly Leu Asp Ala Tyr Pro Ser Asp Val Leu Pro Asn His Leu 90 192 240 288 336 384 390 Gin His Lys Asn Met Ala 115 Asp 100 Gly Val Pro Leu Lys Thr Tyr Asn 105 Ser Asn Thr 120 Ala Asp Tyr Pro Cys Gin 110 Gly Met 125 Phe Cys WO 97/40058 PCT/US97/06121 271 Gly Phe 130 INFORMATION FOR SEQ ID NO:144: SEQUENCE CHARACTERISTICS: LENGTH: 261 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..261 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:144: CCG ACT ACC TTA GGC GGA TCT ACT GGT CAC AGC AAT GAT ATT GCC CTA 48 Pro Thr Thr Leu Gly Gly Ser Thr Gly His Ser Asn Asp Ile Ala Leu 1 5 10 ATC AAA GTC GAT AGA GAT ATC AAA TTC AGC AAA ACT GTC CAA CCT ATC 96 Ile Lys Val Asp Arg Asp Ile Lys Phe Ser Lys Thr Val Gin Pro Ile 25 AAA TTG CAC AAA AGT TTA ATA AAT GGA GGT GAA AAA TTG AAA ATT ACT 144 Lys Leu His Lys Ser Leu Ile Asn Gly Gly Glu Lys Leu Lys Ile Thr 40 GGA TGG GGA TTG ACG AAT CAA AGT CAT AGT GAT GAA CCA GAT GTT CTT 192 Gly Trp Gly Leu Thr Asn Gin Ser His Ser Asp Glu Pro Asp Val Leu 50 55 CAA GAG TTG CAT GTA AAA GCA CTT ACT GAT TCT GAG TGC GAG AAA GCT 240 Gin Glu Leu His Val Lys Ala Leu Thr Asp Ser Glu Cys Glu Lys Ala 70 75 ACA GGT GAA GAC CAT CCT ACA 261 Thr Gly Glu Asp His Pro Thr INFORMATION FOR SEQ ID NO:145: SEQUENCE CHARACTERISTICS: LENGTH: 87 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:145: Pro Thr Thr Leu Gly Gly Ser Thr Gly His Ser Asn Asp Ile Ala Leu 1 5 10 WO 97/40058 PCT/US97/06121.

272 Ile Lys Val Asp Arg Asp Ile Lys Phe Ser Lys Thr Val Gin Pro Ile -25 Lys Leu His Lys Ser Leu Ile Asn Gly Gly Glu Lys Leu Lys Ile Thr 40 Gly Trp Gly Leu Thr Asn Gin Ser His Ser Asp Glu Pro Asp Val Leu 55 Gin Glu Leu His Val Lys Ala Leu Thr Asp Ser Glu Cys Glu Lys Ala 70 75 Thr Gly Glu Asp His Pro Thr INFORMATION FOR SEQ ID NO:146: SEQUENCE CHARACTERISTICS: LENGTH: 612 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..612 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:146: GTT TGC AAG ATG GCA ACA ACT CTG TTT'TCT CTT GCC ATA GTG TTG TTT Val Cys Lys Met Ala Thr Thr Leu Phe Ser Leu Ala Ile Val Leu Phe 1 5 10 ATC TCT ACT TCA GAA GAA TCA GCA CAT ATT TCG CAA GGA TCT CGA ATA Ile Ser Thr Ser Glu Glu Ser Ala His Ile Ser Gin Gly Ser Arg Ile 25 TTA GGA GGT AGA AAT GCA AAA CTC GGA GAT GCT CCT TAT CAA GTA TCA Leu Gly Gly Arg Asn Ala Lys Leu Gly Asp Ala Pro Tyr Gin Val Ser 35 40 CTA AGA GAT AAT TTT GGA CAT TTT TGT GGA GGT TCT ATC ATT AGT GAA Leu Arg Asp Asn Phe Gly His Phe Cys Gly Gly Ser Ile Ile Ser Glu 55 AAT TTT GTG ATT ACA GCA GCT CAT TGC CTT GAT GGA TAC ACA GTG AGC Asn Phe Val Ile Thr Ala Ala His Cys Leu Asp Gly Tyr Thr Val Ser 70 75 AAA TTT AAA GTA GCA ACT GGT ACA ATC GAG TAT GGT AAA GGA GGT GAC Lys Phe Lys Val Ala Thr Gly Thr Ile Glu Tyr Gly Lys Gly Gly Asp 90 GAA TAT AAA GTC ATC AAC TTT GTT GTC CGG GAT GAC TTT CAA TAT GTC Glu Tyr Lys Val Ile Asn Phe Val Val Arg Asp Asp Phe Gin Tyr Val 100 105 110 AAA TTA GAA AAT GAT ATT GCT ATA GTA CAA ATA GAT GGA TCT TTT AAA 48 96 144 192 240 288 336 384 WO 97/40058 PCT/US97/06121 273 Lys Leu Glu Asn Asp Ile Ala Ile Val Gin Ile Asp Gly Ser Phe Lys 115 120 125 TTT AAT GAC TAT GTA AAG CCT ATA AAA TTG CCA AAT CAA GAT ACT AAA Phe Asn Asp Tyr Val Lys Pro Ile Lys Leu Pro Asn Gin Asp Thr Lys 130 135 140 GTT GGC GCG GAT GTT GTC CTA ACA GGA TGG GGA AAA ATG GAA GGT GGT Val Gly Ala Asp Val Val Leu Thr Gly Trp Gly Lys Met Glu Gly Gly 145 150 155 160 AAA AAT CCA GAA ACT CTA CAA ATC TTG AAC TTA AAA ACA ATT GAT CAA Lys Asn Pro Glu Thr Leu Gin Ile Leu Asn Leu Lys Thr Ile Asp Gin 165 170 175 GGA GAA TGC AAA CAA GCT TTG GCA GAA GTA AAC ACA GTT CTT CCA AGT Gly Glu Cys Lys Gin Ala Leu Ala Glu Val Asn Thr Val Leu Pro Ser 180 185 190 CAA ATT TGT ACC TAT GTT GGT GTT GGC AAA GGA GCT Gin Ile Cys Thr Tyr Val Gly Val Gly Lys Gly Ala 195 200 INFORMATION FOR SEQ ID NO:147: SEQUENCE CHARACTERISTICS: LENGTH: 204 amino acids TYPE: amino acid TOPOLOGY: linear 432 480 528 576 612 Val 1 Ile Leu Leu Asn Lys Glu Lys (ii) MOLECULE (xi) SEQUENCE Cys Lys Met Ala 5 Ser Thr Ser Glu Gly Gly Arg Asn 35 Arg Asp Asn Phe Phe Val Ile Thr Phe Lys Val Ala Tyr Lys Val Ile 100 Leu Glu Asn Asp 115 TYPE: protein DESCRIPTION: SEQ ID Thr Thr Leu Phe Ser 10 Glu Ser Ala His Ile Ala Lys Leu Gly Asp 40 Gly His Phe Cys Gly 55 Ala Ala His Cys Leu 70 Thr Gly Thr Ile Glu 90 Asn Phe Val Val Arg 105 Ile Ala Ile Val Gin 120 NO:147: Leu Ala Ser Gin Ala Pro Gly Ser Asp Gly 75 Tyr Gly Asp Asp Ile Asp Ile Gly Tyr Ile Tyr Lys Phe Gly 125 Leu Arg Val Ser Val Gly Tyr Phe Phe Ile Ser Glu Ser Asp Val Lys WO 97/40058 PCT/US97/06121 27 Phe Asn Asp Tyr Val Lys Pro Ile Lys 130 135 Val Gly Ala Asp Val Val Leu Thr Gly 145 150 Lys Asn Pro Glu Thr Leu Gin Ile Leu J 165 Gly Glu Cys Lys Gin Ala Leu Ala Glu 180 185 Gin Ile Cys Thr Tyr Val Gly Val Gly 195 200 INFORMATION FOR SEQ ID NO:148: SEQUENCE CHARACTERISTICS: LENGTH: 641 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA 4 Leu Pro Asn Gin Asp 140 Prp Gly Lys Met Glu 155 \sn Leu Lys Thr Ile 170 lal Asn Thr Val Leu 190 Lys Gly Ala Thr Gly Asp 175 Pro Lys Gly 160 Gin Ser (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..641 OTHER INFORMATION: /note= "At pos. bp 91, change G to N; at pos. bp 385, change A to M; at bp 404, change A to W. At pos. aa 30, 128 and 134, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:148: CA AAC ATG AAA CTT TAT GTT TTT ACA TTG ATC ATC GCA GCC GCT GTT Asn Met Lys Leu Tyr Val Phe Thr Leu Ile Ile Ala Ala Ala Val 1 5 10 GCT GCA CCA ACA AAC GAA ATC GAT TGG ATA GGC TTT CCC GAA GNT TTT Ala Ala Pro Thr Asn Glu Ile Asp Trp Ile Gly Phe Pro Glu Xaa Phe 20 25 CCT CGT GTA GTT GGT GGC CAG ACT GCC AAA CCT CAC CAG TTC CCC TGG Pro Arg Val Val Gly Gly Gin Thr Ala Lys Pro His Gin Phe Pro Trp 40 CAG GTT TCC CTG CAA AGG TCC GGA AAG CAT TTG TGC GGC GGT TCC ATC Gin Val Ser Leu Gin Arg Ser Gly Lys His Leu Cys Gly Gly Ser Ile 55 TTG AAC GAC AGG TGG GTC TTG ACA GCC GCG CAC TGC ATC AGC GGA ACT Leu Asn Asp Arg Trp Val Leu Thr Ala Ala His Cys Ile Ser Gly Thr 70 GAA AAT TAC GAG GCT GTA GTC GGA AAA CAC GAT TTG TCG AAA AGC GAA Glu Asn Tyr Glu Ala Val Val Gly Lys His Asp Leu Ser Lys Ser Glu 85 90 TCA TCT GAG CAA CGT TGC GCC TAC AAG AGG ACC ATC GTC CAC TCG TCC WO 97/40058 PCT/US97/06121 275 Ser Ser Glu Gin Arg Cys Ala Tyr Lys Arg Thr Ile Val His Ser Ser 100 105 110 TTC ACT GGA AGG GTG GGT CCT TAC GAT GTC GCT TTG ATT GAA TTA GAA Phe Thr Gly Arg Val Gly Pro Tyr Asp Val Ala Leu Ile Glu Leu Glu 115 120 125 AMA CCT TTC AAA TTG AAC GAW AAA TGT AAG CCA ATC AGA CTT CCA TTG Xaa Pro Phe Lys Leu Asn Xaa Lys Cys Lys Pro Ile Arg Leu Pro Leu 130 135 140 AAA GAT GAA GCA CAT TCT GGA CAA GTG ACG CTT TCT GGT TGG GGA TCT Lys Asp Glu Ala His Ser Gly Gin Val Thr Leu Ser Gly Trp Gly Ser 145 150 155 ACT TCA ACT ACC ATC TTC CCC ACT TAC CCA AAT GAA CTA CAG TAT GTT Thr Ser Thr Thr Ile Phe Pro Thr Tyr Pro Asn Glu Leu Gin Tyr Val 160 165 170 175 GAC AAA CCG ATT GTT CCA TAT ACT GAT TGC GAA AAT GCT ATG GGC GGA Asp Lys Pro Ile Val Pro Tyr Thr Asp Cys Glu Asn Ala Met Gly Gly 180 185 190 CCA GGA GCA TCT CCT CTT GAT CCT TTG AAC ATC TGC ACT GGT CCC TTG Pro Gly Ala Ser Pro Leu Asp Pro Leu Asn Ile Cys Thr Gly Pro Leu 195 200 205 ACT GGT GGC ATC AGT GCT Thr Gly Gly Ile Ser Ala 210 INFORMATION FOR SEQ ID NO:149: SEQUENCE CHARACTERISTICS: LENGTH: 213 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:149: Asn Met Lys Leu Tyr Val Phe Thr Leu Ile Ile Ala Ala Ala Val Ala 1 5 10 Ala Pro Thr Asn Glu Ile Asp Trp Ile Gly Phe Pro Glu Xaa Phe Pro 25 Arg Val Val Gly Gly Gin Thr Ala Lys Pro His Gin Phe Pro Trp Gin 40 Val Ser Leu Gin Arg Ser Gly Lys His Leu Cys Gly Gly Ser Ile Leu 55 Asn Asp Arg Trp Val Leu Thr Ala Ala His Cys Ile Ser Gly Thr Glu 65 70 75 Asn Tyr Glu Ala Val Val Gly Lys His Asp Leu Ser Lys Ser Glu Ser 90 383 431 479 527 575 623 641 WO 97/40058 PCTIUS97/06121 276 Ser Glu Gin Arg Cys Ala Tyr Lys Arg Thr Ile Val His Ser Ser Phe 100 105 110 Thr Gly Arg Val Gly Pro Tyr Asp Val Ala Leu Ile Glu Leu Glu Xaa 115 120 125 Pro Phe Lys Leu Asn Xaa Lys Cys Lys Pro Ile Arg Leu Pro Leu Lys 130 135 140 Asp Glu Ala His Ser Gly Gin Val Thr Leu Ser Gly Trp Gly Ser Thr 145 150 155 160 Ser Thr Thr Ile Phe Pro Thr Tyr Pro Asn Glu Leu Gin Tyr Val Asp 165 170 175 Lys Pro Ile Val Pro Tyr Thr Asp Cys Glu Asn Ala Met Gly Gly Pro 180 185 190 Gly Ala Ser Pro Leu Asp Pro Leu Asn Ile Cys Thr Gly Pro Leu Thr 195 200 205 Gly Gly Ile Ser Ala 210 INFORMATION FOR SEQ ID NO:150: SEQUENCE CHARACTERISTICS: LENGTH: 626 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..626 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:150: TG ACT ATG AAC TTT GCA TTG TGG TTT GTG ACT CTT GTG TCC ATC ACC 47 Thr Met Asn Phe Ala Leu Trp Phe Val Thr Leu Val Ser Ile Thr 1 5 10 AGC GCT GAT CCA ATT CGT GTT TCC TCA ATT GGA AAT GCT AAT ATG GAA Ser Ala Asp Pro Ile Arg Val Ser Ser Ile Gly Asn Ala Asn Met Glu 25 TCT CGC GTT GTT GGT GGC GAA AAC GCA GAA GTA GGA GCT GCT CCT TAC 143 Ser Arg Val Val Gly Gly Glu Asn Ala Glu Val Gly Ala Ala Pro Tyr 40 CAA GTT TCT TTG AAA TAC AAT AAT GGA GCT CAT TTT TGC GGA GGT GTC 191 Gin Val Ser Leu Lys Tyr Asn Asn Gly Ala His Phe Cys Gly Gly Val 55 GTG ATA ACC AAA ACT TGG GTG CTA ACT GCT GCA CGT TGT ATT CAC GAG 239 Val Ile Thr Lys Thr Trp Val Leu Thr Ala Ala Arg Cys Ile His Glu 70 GAG GAA CCC GAC AGA TTC ACA GTG GTT GTT GGT ACC AAC ACT TTA AAT 287 WO 97/40058 PCT/US97/06121 Glu

GCT

Ala

TTC

Phe

TCA

Ser

ATG

Met

TTA

Leu 160

TTA

Leu

ATG

Met

CTT

Leu 277 Pro Asp Arg Phe Thr Val Val Val Gly Thr Asn Thr Leu Asn 85 90 GGA GAA GGT TAC AAC GTT AAA CAG ATA GTT ATT CAT ATG CAA Gly Glu Gly Tyr Asn Val Lys Gin Ile Val Ile His Met Gin 100 105 110 CAA GTT TAT CTT CTG AAC GAT ATT GGT TTG ATC GAA ACC GAA Gin Val Tyr Leu Leu Asn Asp Ile Gly Leu Ile Glu Thr Glu 115 120 125 ATA CAA TTT CAT GAT CTG GTT AAG CCA ATC TCA GTC CCC AAT Ile Gin Phe His Asp Leu Val Lys Pro Ile Ser Val Pro Asn 130 135 140 GTT GAA GAT GGC ACG AGA GTT ACT CTT TTT GGA TGG GGA AAT Val Glu Asp Gly Thr Arg Val Thr Leu Phe Gly Trp Gly Asn 150 155 GCT GAA GGA CAT ATG CCA AAC CAT TTA CAA ACA ATT GAT TTG Ala Glu Gly His Met Pro Asn His Leu Gin Thr Ile Asp Leu 165 170 175 ATA AAT CTG AGT GAA TGC TCC CGA TTA TTA CCT GAA CCA AGT Ile Asn Leu Ser Glu Cys Ser Arg Leu Leu Pro Glu Pro Ser 180 185 190 AGT ACA AAA CAC ATT TGC ACC TTT GTT TCA TAT GGA AAA GGA Ser Thr Lys His Ile Cys Thr Phe Val Ser Tyr Gly Lys Gly 195 200 205 335 383 431 479 527 575 623 626 INFORMATION FOR SEQ ID NO:151: SEQUENCE CHARACTERISTICS: LENGTH: 208 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:151: Thr Met Asn Phe Ala Leu Trp Phe Val Thr Leu Val 1 5 10 Ala Asp Pro Ile Arg Val Ser Ser Ile Gly Asn Ala 25 Arg Val Val Gly Gly Glu Asn Ala Glu Val Gly Ala 40 Val Ser Leu Lys Tyr Asn Asn Gly Ala His Phe Cys 55 Ile Thr Lys Thr Trp Val Leu Thr Ala Ala Arg Cys 70 75 Ser Ile Thr Asn Met Glu Ala Pro Tyr Gly Gly Val Ile His Glu Ser Ser Gin Val Glu WO 97/40058 PCTIUS97/06121 278 Glu Pro Asp Arg Phe Thr Val Val Val Gly Thr Asn Thr Leu Asn Ala 90 Gly Gly Glu Gly Tyr Asn Val Lys Gin Ile Val Ile His Met Gin Phe 100 105 110 Asn Gin Val Tyr Leu Leu Asn Asp Ile Gly Leu Ile Glu Thr Glu Ser 115 120 125 Pro Ile Gin Phe His Asp Leu Val Lys Pro Ile Ser Val Pro Asn Met 130 135 140 His Val Glu Asp Gly Thr Arg Val Thr Leu Phe Gly Trp Gly Asn Leu 145 150 155 160 Thr Ala Glu Gly His Met Pro Asn His Leu Gin Thr Ile Asp Leu Leu 165 170 175 Thr Ile Asn Leu Ser Glu Cys Ser Arg Leu Leu Pro Glu Pro Ser Met 180 185 190 Ile Ser Thr Lys His Ile Cys Thr Phe Val Ser Tyr Gly Lys Gly Leu 195 200 205 INFORMATION FOR SEQ ID NO:152: SEQUENCE CHARACTERISTICS: LENGTH: 815 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..762 OTHER INFORMATION: /note= "At pos. bp 454, change G to V; at 456, G to V; at 457, A to M; at 460, A to R; at 470, G to S; at 493, A to R. At pos. aa 120, 136, 152, 153, 154, 157 and 165, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:152: ATT TTA TTA AGC GCA TTA TTT GCA AGT GTA ATT TGC TCC TTT AAC GCG 48 Ile Leu Leu Ser Ala Leu Phe Ala Ser Val Ile Cys Ser Phe Asn Ala 1 5 10 GAA GTA CAA AAT CGA ATC GTT GGT GGC AAT GAT GTA AGT ATT TCA AAA 96 Glu Val Gin Asn Arg Ile Val Gly Gly Asn Asp Val Ser Ile Ser Lys 25 ATT GGG TGG CAA GTA TCT ATT CAA AGT AAT AAC CAA CAT TTC TGT GGT 144 Ile Gly Trp Gin Val Ser Ile Gin Ser Asn Asn Gin His Phe Cys Gly 35 40 GGT TCA ATC ATT GCT AAA GAT TGG GTA CTG ACT TCT TCT CAA TGC GTC 192 Gly Ser Ile Ile Ala Lys Asp Trp Val Leu Thr Ser Ser Gin Cys Val 55 GTG GAC AAA CAA AGT CCA CCG AAG GAT TTA ACT GTT CGT GTT GGA ACT 240 WO 97/40058 PCTIUS97/06121 279 Val Asp Lys Gin Ser Pro Pro Lys Asp Leu Thr Val Arg Val Gly Thr 70 75 AGC ACT CAC AAT GAT GGA GGA AAA GTG TAT GAT GTT ATT GAA ATT ATA 288 Ser Thr His Asn Asp Gly Gly Lys Val Tyr Asp Val Ile Glu Ile Ile 85 90 AAA CAT CCG AAA TAT AAT AAA GCA GTG CCA GAT GAT TTT GAT GTT GCA 336 Lys His Pro Lys Tyr Asn Lys Ala Val Pro Asp Asp Phe Asp Val Ala 100 105 110 CTT TTA CGG ATC AAA GAG CCA ATC ATT TAC TCC ATG CAC AGT AAC TCC 384 Leu Leu Arg Ile Lys Glu Pro Xaa Ile Tyr Ser Met His Ser Asn Ser 115 120 125 TGT AAA ATT AAT ACA ATC GGG AAA GAA GTA CCG AAG GGA ACA ACT TTG 432 Cys Lys Ile Asn Thr Ile Gly Xaa Glu Val Pro Lys Gly Thr Thr Leu 130 135 140 AGT GTA ACT GGA TGG GGC GCM VCV MAG RAA TGG GGG CSC AAT TTC GCC 480 Ser Val Thr Gly Trp Gly Ala Xaa Xaa Xaa Trp Gly Xaa Asn Phe Ala 145 150 155 160 AAA GTT ACA AGA RAG TTA AAG TTA AAG CTA CTC AAG TCA AGA ATG CAA 528 Lys Val Thr Arg Xaa Leu Lys Leu Lys Leu Leu Lys Ser Arg Met Gin 165 170 175 GAA CAG TCT GCT ATT AAC AGT GAC ATC ATT TCT GAC AGT ATG ATG TGC 576 Glu Gin Ser Ala Ile Asn Ser Asp Ile Ile Ser Asp Ser Met Met Cys 180 185 190 GCT GGT TTT CCT CAA GGA CAA AAA GAT ACT TGT CAT GGG GAT AGC GGT 624 Ala Gly Phe Pro Gin Gly Gin Lys Asp Thr Cys His Gly Asp Ser Gly 195 200 205 GGC ACT GTA GAT AAA AAA CAG GTT CAA GTA GGA GTT ATA TCC TGG AGG 672 Gly Thr Val Asp Lys Lys Gin Val Gin Val Gly Val Ile Ser Trp Arg 210 215 220 CGA GGA TGC GCG CGA CCT GGA TAT CCT GGC GTA TAT ACA AAA TTG AGC 720 Arg Gly Cys Ala Arg Pro Gly Tyr Pro Gly Val Tyr Thr Lys Leu Ser 225 230 235 240 CAC CCG GAA ATC CAA CAG TTT ATT AAA AAC AAT GTA AAA CTT 762 His Pro Glu Ile Gin Gin Phe Ile Lys Asn Asn Val Lys Leu 245 250 TAAATCATAA AACTGTATGA AAATAACAAT AACAATTACG GGAAAAAAAA AAA 815 INFORMATION FOR SEQ ID NO:153: SEQUENCE CHARACTERISTICS: LENGTH: 254 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:153: WO97/40058 PCT/US97/06121 280 Ile Leu Leu Ser Ala Leu Phe Ala Ser Val Ile Cys Ser Phe Asn Ala 1 5 10 Glu Val Gin Asn Arg Ile Val Gly Gly Asn Asp Val Ser Ile Ser Lys 25 Ile Gly Trp Gin Val Ser Ile Gin Ser Asn Asn Gin His Phe Cys Gly 40 Gly Ser Ile Ile Ala Lys Asp Trp Val Leu Thr Ser Ser Gin Cys Val 55 Val Asp Lys Gin Ser Pro Pro Lys Asp Leu Thr Val Arg Val Gly Thr 65 70 75 Ser Thr His Asn Asp Gly Gly Lys Val Tyr Asp Val Ile Glu Ile Ile 90 Lys His Pro Lys Tyr Asn Lys Ala Val Pro Asp Asp Phe Asp Val Ala 100 105 110 Leu Leu Arg Ile Lys Glu Pro Xaa Ile Tyr Ser Met His Ser Asn Ser 115 120 125 Cys Lys Ile Asn Thr Ile Gly Xaa Glu Val Pro Lys Gly Thr Thr Leu 130 135 140 Ser Val Thr Gly Trp Gly Ala Xaa Xaa Xaa Trp Gly Xaa Asn Phe Ala 145 150 155 160 Lys Val Thr Arg Xaa Leu Lys Leu Lys Leu Leu Lys Ser Arg Met Gin 165 170 175 Glu Gin Ser Ala Ile Asn Ser Asp Ile Ile Ser Asp Ser Met Met Cys 180 185 190 Ala Gly Phe Pro Gin Gly Gin Lys Asp Thr Cys His Gly Asp Ser Gly 195 200 205 Gly Thr Val Asp Lys Lys Gin Val Gin Val Gly Val Ile Ser Trp Arg 210 215 220 Arg Gly Cys Ala Arg Pro Gly Tyr Pro Gly Val Tyr Thr Lys Leu Ser 225 230 235 240 His Pro Glu Ile Gin Gin Phe Ile Lys Asn Asn Val Lys Leu 245 250 INFORMATION FOR SEQ ID NO:154: SEQUENCE CHARACTERISTICS: LENGTH: 177 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:154: WO 97/40058 PCT/US97/06121 281

CACGAGATTT

CAAAATCGAA

ATTCAAAGTA

TATTAAGCGC ATTATTTGCA AGTGTAATTT GCTCCTTTAA TCGTTGGTGG CAATGATGTA AGTATTTCAA AAATTGGGTG ATAAACAACA TTTCTGTGGT GGTTCAATCA TTGCTAAAGA INFORMATION FOR SEQ ID NO:155: SEQUENCE CHARACTERISTICS: LENGTH: 359 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:155: GGGTTCGAAT TTGTGGATCG AAAAGGCAGA TATTACGATG TAGAAAGATT CATAATTATA CTGGAAAGAT AGTTGCCAAT GTCGCTGATA TAGGTCTAAT GAAGATATAA AATTCAGTGA CAAGGTACAA CCTGTAAAAA TTCATCAAAC GGCGGAGAGA TTTGCAAAGC TACTGGATGG GGCAGGTTGG GTGCTGATCA AATAAATTAC AACAATTGGA GACAATTGCT ATTAGTGATG AGAAATGTTA GGGTTTTTAG AACCTACATC TCAAATATGT GTATTCAGTG CATTTGGAAA

CGCGGAAGTA

GCAAGTATCT

TGGGTCC

TGTGATGCAC

AAAACTAGCA

TCAAATCAAG

GCCTGTACCA

TGCAGATACA

AGGAGTTGT

120 177 120 180 240 300 359 INFORMATION FOR SEQ ID NO:156: SEQUENCE CHARACTERISTICS: LENGTH: 855 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..759 OTHER INFORMATION: /note= "At pos. bp 693, change C to N. At pos. aa 231, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:156: ATG GCT TAT ATT ATA TTA GTT ACT TTA ATT AGT TTG GGA TCG TTG GTT Met Ala Tyr Ile Ile Leu Val Thr Leu Ile Ser Leu Gly Ser Leu Val 1 5 10 TCT TCC GAA TAC CTT TCG TTT TCT ACT GAT CCT CGG ATA ATT GGT GGT Ser Ser Glu Tyr Leu Ser Phe Ser Thr Asp Pro Arg Ile Ile Gly Gly 25 WO 97/40058 PTU9/62 PCT/US97/06121

GAA

Gi u

TCT

Ser

ATT

Ile

TAC

Tyr

GAA

Giu

GTC

Val

GAC

Asp

GAT

Asp

GAC

Asp

TTA

Leu

ATG

Met

AGA

Arg

GCT

Al a

AAG

Lys 130

GCT

Ala

TTG

Leu

ACA

Thr

GGT

Gi y

TTT

Phe

GAT

Asp 115

GTA

Val 282 CCC GAG GGT TCT GCA CCA TAT CAG GTT TCA TTA AGA AAT Pro Giu Gly Ser Ala Pro Tyr Gin Val Ser Leu Arg Asn 40 CAG CAT TTT TGT GGT GGT TCC ATC CTA AAC AAA CGA TGG Gin His Phe Cys Gly Gly Ser Ile Leu Asn Lys Arg Trp 55 GCA GCA CAT TGT CTC GAA CCT GGT TTT TTA AAT TCT GTA Ala Ala His Cys Leu Giu Pro Gly Phe Leu Asn Ser Val 70 75 TCG AAT TTG TTG GAT CGA AAA GGC AGA TAT TAC GAT GTA Ser Asn Leu Leu Asp Arg Lys Giy Arg Tyr Tyr Asp Val 90 GTG ATG CAC CAT AAT TAT ACT GGA AAG ATA GTT GCC AAT Val Met His His Asn Tyr Thr Gly Lys Ile Val Ala Asn 100 105 110 ATA GGT CTA ATA AAA CTA GCA GAA GAT ATA AAA TTC AGT Ile Gly Leu Ile Lys Leu Aia Giu Asp Ile Lys Phe Ser 120 125 CAA CCT GTA AAA ATT CAT CAA ACT CAA ATC AAG GGC GGA Gin Pro Val Lys Ile His Gin Thr Gin Ile Lys Gly Gly 135 140 A.AA GCT ACT GGA TGG GGC AGG TTG GGT GCT GAT CAG CCT Lys Ala Thr Gly Trp, Gly Arg Leu Giy Ala Asp Gin Pro 150 155 160 GAG ATT TGC Giu Ile Cys 144 192 240 288 336 384 432 480 528 576 624 672 720 769 829 855 GTA CCA AAT AAA TTA CAA CAA TTG GAG ACA ATT GCT ATT AGT GAT GAG Val Pro Asn Lys Leu Gin Gin Leu Glu Thr Ile Ala Ile Ser Asp Giu 165 170 175 AAA TGT TAT GCA GALT ACA GGG TTT TTA GAA CCT ACA TCT CAA ATA TGT Lys Cys Tyr Ala Asp Thr Gly Phe Leu Giu Pro Thr Ser Gin Ile Cys 180 185 190 GTA TTC AGT GCA TTT GGA AAA GGA GTT TGT TTT GGA. GAT TCT GGT GGT Val Phe Ser Ala Phe Gly Lys Gly Val Cys Phe Gly Asp Ser Gly Gly 195 200 205 CCA TTA GTT TAC A.AA GGT GPA CAA GTA GGA GTT GCA TCA TTC ATC ATG Pro Leu Val Tyr Lys Gly Glu Gin Val Giy Vai Ala Ser Phe Ile Met 210 215 220 ATC ACT TGT GGT GGT GGC AGN CCA GAT GTA TTT GTT AGA GTA CTC GAT Ile Thr Cys Giy Gly Gly Xaa Pro Asp Val Phe Vai Arg Val Leu Asp 225 230 235 240 TAT CAG GAT TGG ATA PAT TCA TTT ATT TCT GGA GAT AAC TAGTCTTTAA.

Tyr Gin Asp Trp Ile Asn Ser Phe Ile Ser Gly Asp Asn 245 250 TGTAAP.ATGA ACTATTATAA TATATATTTT TTATTCTTAT AAAATATATA CATTTTATTA CGCACAAAA AAAAA AAAPAA WO 97/40058 PCT/US97/06121 283 INFORMATION FOR SEQ ID NO:157: SEQUENCE CHARACTERISTICS: LENGTH: 253 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:157: Met Ala Tyr Ile Ile Leu Val Thr Leu Ile Ser Leu Gly Ser Leu Val 1 5 10 Ser Ser Glu Tyr Leu Ser Phe Ser Thr Asp Pro Arg Ile Ile Gly Gly 25 Glu Asp Ala Pro'Glu Gly Ser Ala Pro Tyr Gin Val Ser Leu Arg Asn 40 Ser Asp Leu Gin His Phe Cys Gly Gly Ser Ile Leu Asn Lys Arg Trp 50 55 Ile Leu Thr Ala Ala His Cys Leu Glu Pro Gly Phe Leu Asn Ser Val 70 75 Tyr Met Gly Ser Asn Leu Leu Asp Arg Lys Gly Arg Tyr Tyr Asp Val 90 Glu Arg Phe Val Met His His Asn Tyr Thr Gly Lys Ile Val Ala Asn 100 105 110 Val Ala Asp Ile Gly Leu Ile Lys Leu Ala Glu Asp Ile Lys Phe Ser 115 120 125 Asp Lys Val Gin Pro Val Lys Ile His Gin Thr Gin Ile Lys Gly Gly 130 135 140 Glu Ile Cys Lys Ala Thr Gly Trp Gly Arg Leu Gly Ala Asp Gin Pro 145 150 155 160 Val Pro Asn Lys Leu Gin Gin Leu Glu Thr Ile Ala Ile Ser Asp Glu 165 170 175 Lys Cys Tyr Ala Asp Thr Gly Phe Leu Glu Pro Thr Ser Gin Ile Cys 180 185 190 Val Phe Ser Ala Phe Gly Lys Gly Val Cys Phe Gly Asp Ser Gly Gly 195 200 205 Pro Leu Val Tyr Lys Gly Glu Gin Val Gly Val Ala Ser Phe Ile Met 210 215 220 Ile Thr Cys Gly Gly Gly Xaa Pro Asp Val Phe Val Arg Val Leu Asp 225 230 235 240 Tyr Gin Asp Trp Ile Asn Ser Phe Ile Ser Gly Asp Asn 245 250 WO 97/40058 PCT/US97/06121 284 INFORMATION FOR SEQ ID NO:158: SEQUENCE CHARACTERISTICS: LENGTH: 864 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..781 OTHER INFORMATION:. /note= "At pos. bp 456, change G to K; at pos. bp 504, change A to R. At pos. aa 152 and 168, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:158: T ACA AAA CCT ATT ACA ATT CAA AAG TTG TTC CAA ATG ATG GCA AAT 46 Thr Lys Pro Ile Thr Ile Gin Lys Leu Phe Gin Met Met Ala Asn 1 5 10 TTT GTG CTA TTC ACC TTA CTA GCC TTA GTA TCA GTA GCA TGT TCC AAA 94 Phe Val Leu Phe Thr Leu Leu Ala Leu Val Ser Val Ala Cys Ser Lys 20 25 TAT ATT GAT CCA AGA ATC ATT GGA GGC GAA GAT GCT CCT GAA GGC TCG 142 Tyr Ile Asp Pro Arg Ile Ile Gly Gly Glu Asp Ala Pro Glu Gly Ser 40 GCT CCG TAT CAA GTT TCA CTG AGA AAT CGG GAC CTA GAG CAT TTC TGT 190 Ala Pro Tyr Gin Val Ser Leu Arg Asn Arg Asp Leu Glu His Phe Cys 55 GGT GGC TCC ATC TTA AAC AAA CGT TGG ATT GTG ACG GCT GCA CAT TGC 238 Gly Gly Ser Ile Leu Asn Lys Arg Trp Ile Val Thr Ala Ala His Cys 70 CTA AAA CCT GGC ATT TTA AAA TCC GTC TAT ATG GGA TCA AAC TCA TTA 286 Leu Lys Pro Gly Ile Leu Lys Ser Val Tyr Met Gly Ser Asn Ser Leu 85 90 GAT GGC AAT GGT ACA TAC TAC GAC GTC GAA CGT TTT GTG ATG CAT CAT 334 Asp Gly Asn Gly Thr Tyr Tyr Asp Val Glu Arg Phe Val Met His His 100 105 110 AAA TAT ACA CCA AAA ATT ACT GTC AAC TAT GCT GAT ATT GGT CTA ATA 382 Lys Tyr Thr Pro Lys Ile Thr Val Asn Tyr Ala Asp Ile Gly Leu Ile 115 120 125 AAA GTG ACA AAA GAC ATT ATA TTC AGT GAC AAA GTT CAA CCA ATC AAA 430 Lys Val Thr Lys Asp Ile Ile Phe Ser Asp Lys Val Gin Pro Ile Lys 130 135 140 ATA GCA AAA AAA ATA TCA AGG GTG GKG AAT CTG CAA GGC CAC TGG TTG 478 Ile Ala Lys Lys Ile Ser Arg Val Xaa Asn Leu Gin Gly His Trp Leu 145 150 155 GGG TCG ATT GGC GGA TGG GGC CCC CRG TAC CAA ACG AAT TGC AAC AAG 526 WO97/40058 PCT/US97/06121 285 Gly Ser Ile Gly Gly Trp Gly Pro Xaa Tyr Gin Thr Asn Cys Asn Lys 160 165 170 175 GTG GAA ACC ACT GCA ATA ACA AAT GAA AAG TGC TAC GAA TTG TCT CAA 574 Val Glu Thr Thr Ala Ile Thr Asn Glu Lys Cys Tyr Glu Leu Ser Gin 180 185 190 TTC GTT GAG CCA ACT TCG CAA ATA TGT ACA TTA AGG GAA TTT TTA AGA 622 Phe Val Glu Pro Thr Ser Gin Ile Cys Thr Leu Arg Glu Phe Leu Arg 195 200 205 GGC ATT TGC TTT GGT GAT TCT GGT GGA CCA CTG GTT TAC AAA GGT GAA 670 Gly Ile Cys Phe Gly Asp Ser Gly Gly Pro Leu Val Tyr Lys Gly Glu 210 215 220 CTG GTT GGC GTT TCT TCG TTT GTC TTG TAC ACT TGC GGA GCT GGA CGC 718 Leu Val Gly Val Ser Ser Phe Val Leu Tyr Thr Cys Gly Ala Gly Arg 225 230 235 CCA GAT GTT TTT GTT AAA GTG CGT GAT TTC CAA TCT TGG ATC AAT TCT 766 Pro Asp Val Phe Val Lys Val Arg Asp Phe Gin Ser Trp Ile Asn Ser 240 245 250 255 GAA ATT AGA AAA AAA TAAATAGATT TCAATCATGA TTTGTTGTAA TAAAAAATGG 821 Glu Ile Arg Lys Lys 260 TTAAATAAAG GCAGCATAAT TTAAAAAAAA AAAAAAAA AAA 864 INFORMATION FOR SEQ ID NO:159: SEQUENCE CHARACTERISTICS: LENGTH: 260 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:159: Thr Lys Pro Ile Thr Ile Gin Lys Leu Phe Gin Met Met Ala Asn Phe 1 5 10 Val Leu Phe Thr Leu Leu Ala Leu Val Ser Val Ala Cys Ser Lys Tyr 25 Ile Asp Pro Arg Ile Ile Gly Gly Glu Asp Ala Pro Glu Gly Ser Ala 40 Pro Tyr Gin Val Ser Leu Arg Asn Arg Asp Leu Glu His Phe Cys Gly 55 Gly Ser Ile Leu Asn Lys Arg Trp Ile Val Thr Ala Ala His Cys Leu 70 75 Lys Pro Gly Ile Leu Lys Ser Val Tyr Met Gly Ser Asn Ser Leu Asp 85 90 Gly Asn Gly Thr Tyr Tyr Asp Val Glu Arg Phe Val Met His His Lys 100 105 110 WO 97/40058 PCT/US97/06121 286 Tyr Thr Pro Lys Ile Thr Val Asn Tyr Ala Asp Ile Gly Leu Ile Lys 115 120 125 Val Thr Lys Asp Ile Ile Phe Ser Asp Lys Val Gin Pro Ile Lys Ile 130 135 140 Ala Lys Lys Ile Ser Arg Val Xaa Asn Leu Gin Gly His Trp Leu Gly 145 150 155 160 Ser Ile Gly Gly Trp Gly Pro Xaa Tyr Gin Thr Asn Cys Asn Lys Val 165 170 175 Glu Thr Thr Ala Ile Thr Asn Glu Lys Cys Tyr Glu Leu Ser Gin Phe 180 185 190 Val Glu Pro Thr Ser Gin Ile Cys Thr Leu Arg Glu Phe Leu Arg Gly 195 200 205 Ile Cys Phe Gly Asp Ser Gly Gly Pro Leu Val Tyr Lys Gly Glu Leu 210 215 220 Val Gly Val Ser Ser Phe Val Leu Tyr Thr Cys Gly Ala Gly Arg Pro 225 230 235 240 Asp Val Phe Val Lys Val Arg Asp Phe Gin Ser Trp Ile Asn Ser Glu 245 250 255 Ile Arg Lys Lys 260 INFORMATION FOR SEQ ID NO:160: SEQUENCE CHARACTERISTICS: LENGTH: 595 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..595 OTHER INFORMATION: /note= "At pos. bp 509, change C to Y; at pos. 556, change C to Y; at pos. 557, change C to Y; at 561, change C to Y; at 573, change C to Y. At pos. aa. 170, 186, 187, 190 and 191, substitute Xaa." (xi) SEQUENCE DESCRIPTION: SEQ ID NO:160: T GGT TCC ATC TTG AAC AAA CGT TGG ATT GTA ACA GCT GCA CAT TGC 46 Gly Ser Ile Leu Asn Lys Arg Trp Ile Val Thr Ala Ala His Cys 1 5 10 CTA AAC GCT GGC ATT TTA AAA TCC GTC TAT TTG GGA TCA AAC TCA TTA 94 Leu Asn Ala Gly Ile Leu Lys Ser Val Tyr Leu Gly Ser Asn Ser Leu 25 GAT GGC GAT GGT ACA TAC TAC GAC GTC GAA CGT TTT GTG ATG CAT GAT 142 WO 97/40058 PCT/US97/06121 287 Asp Gly Asp Gly Thr Tyr Tyr Asp Val Glu Arg Phe Val Met His Asp 40 AAA TAT ACA CCA AGA ATC ACT GTC AAC TAT GCT GAT ATT GGT CTA ATA 190 Lys Tyr Thr Pro Arg Ile Thr Val Asn Tyr Ala Asp Ile Gly Leu Ile 50 55 AAA GTG GCA AAA GAC ATT GTA TTC GGT GAC AAA GTC CAA CCG ATC AAA 238 Lys Val Ala Lys Asp Ile Val Phe Gly Asp Lys Val Gin Pro Ile Lys 70 ATT AGC AAG AGA AAC ATC AAG GGT GGT GAA ATT TGC AAG GCA ACT GGT 286 Ile Ser Lys Arg Asn Ile Lys Gly Gly Glu Ile Cys Lys Ala Thr Gly 85 90 TGG GGT CTA TTA GGT TCT GTG GAC TCA GTA CCA AAC GAA TTA CAA CAA 334 Trp Gly Leu Leu Gly Ser Val Asp Ser Val Pro Asn Glu Leu Gin Gin 100 105 110 GTA GAA ACC ACT GCA ATA ACA GAC GAA AAG TGC TTT GAA TTG ACT CAA 382 Val Glu Thr Thr Ala Ile Thr Asp Glu Lys Cys Phe Glu Leu Thr Gin 115 120 125 TTC ATT GAC CCA ACT TCG CAA ATA TGT ACA TTC AGG GAA TTT GGT AGA 430 Phe Ile Asp Pro Thr Ser Gln Ile Cys Thr Phe Arg Glu Phe Gly Arg 130 135 140 GGC ATT TGC TTT GGT GAT TCT GGT GGA CCA CTA GTT TAC AAA AAT GAA 478 Gly Ile Cys Phe Gly Asp Ser Gly Gly Pro Leu Val Tyr Lys Asn Glu 145 150 155 CTT GTT GGC ATT ACA TCG ATG CAC TTA TAC YCC TGC AGA GGT GGC AGG 526 Leu Val Gly Ile Thr Ser Met His Leu Tyr Xaa Cys Arg Gly Gly Arg 160 165 170 175 CCA GAT ATT TTT TGT GAA AGT GCG AGA TTY YCA AYC CTG GAT TAA AYT 574 Pro Asp Ile Phe Cys Glu Ser Ala Arg Phe Xaa Xaa Leu Asp Xaa Xaa 180 185 190 CTG AAA TTG AAA AAA ATT AAA 595 Leu Lys Leu Lys Lys Ile Lys 195 INFORMATION FOR SEQ ID NO:161: SEQUENCE CHARACTERISTICS: LENGTH: 198 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:161: Gly Ser Ile Leu Asn Lys Arg Trp Ile Val Thr Ala Ala His Cys Leu 1 5 10 Asn Ala Gly Ile Leu Lys Ser Val Tyr Leu Gly Ser Asn Ser Leu Asp 25 WO 97/40058 PCT/US97/0612-1 288 Gly Asp Gly Thr Tyr Tyr Asp Val Glu Arg Phe Val Met His Asp Lys 40 Tyr Thr Pro Arg Ile Thr Val Asn Tyr Ala Asp Ile Gly Leu Ile Lys 55 Val Ala Lys Asp Ile Val Phe Gly Asp Lys Val Gin Pro Ile Lys Ile 70 75 Ser Lys Arg Asn Ile Lys Gly Gly Glu Ile Cys Lys Ala Thr Gly Trp 90 Gly Leu Leu Gly Ser Val Asp Ser Val Pro Asn Glu Leu Gin Gin Val 100 105 110 Glu Thr Thr Ala Ile Thr Asp Glu Lys Cys Phe Glu Leu Thr Gin Phe 115 120 125 Ile Asp Pro Thr Ser Gin Ile Cys Thr Phe Arg Glu Phe Gly Arg Gly 130 135 140 Ile Cys Phe Gly Asp Ser Gly Gly Pro Leu Val Tyr Lys Asn Glu Leu 145 150 155 160 Val Gly Ile Thr Ser Met His Leu Tyr Xaa Cys Arg Gly Gly Arg Pro 165 170 175 Asp Ile Phe Cys Glu Ser Ala Arg Phe Xaa Xaa Leu Asp Xaa Xaa Leu 180 185 190 Lys Leu Lys Lys Ile Lys 195 INFORMATION FOR SEQ ID NO:162: SEQUENCE CHARACTERISTICS: LENGTH: 23 amino acids TYPE: amino acid

STRANDEDNESS:

TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:162: Ile Val. Gly Gly Val Ser Val Asn Ile Asn Asp Tyr Gly Tyr Gin Leu 1 5 10 Ser Leu Gin Ser Asn Gly Arg INFORMATION FOR SEQ ID NO:163: SEQUENCE CHARACTERISTICS: LENGTH: 17 amino acids TYPE: amino acid

STRANDEDNESS:

WO 97/40058 PCT/US97/06121 289 TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:163: Ile Val Gly Gly His Asp Thr Ser Ile Lys Gin His Pro Tyr Gin Val 1 5 10 Ser INFORMATION FOR SEQ ID NO:164: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:164: GAGCTCTCGA GAATCGTAGG AGGACACGAT AC 32 INFORMATION FOR SEQ ID NO:165: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:165: GGACGAATTC TTAAACACCA GACACTTCCT TG 32 While various embodiments of the present invention have been described in detail, it is apparent that modifications and adaptations of those embodiments will occur to those skilled in the art. It is to be expressly understood, however, that such modifications and adaptations are within the scope of the present invention, as set forth in the following claims.

289a The word 'comprising' and forms of the word 'comprising' as used in this description and in the claims does not liit, the invention claimed to exclude any variants or additions.

*a a.

a.

a a a.

a a suim MOl10781806v1 85257 6.10.2000

Claims

1. An isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene selected from the group consisting of a serine protease gene comprising a nucleic acid molecule selected from the group consisting of SEQ ID NO:9, SEQ ID NO:ll, NO: 12, SEQ ID NO: 14, NO: 18, SEQ ID NO:20, NO: 25, SEQ ID NO:26, NO: 31, SEQ ID NO: 32, NO: 37, SEQ ID NO:39, NO:43, SEQ ID NO:120, NO:116, SEQ ID NO:117, NO:131, SEQ ID NO:155, NO:118, SEQ ID NO:128, NO:160, SEQ ID NO: 136, NO:132, SEQ ID NO:134, SEQ ID NO:15, SEQ ID NO:21, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:130, SEQ ID NO:127, SEQ ID NO:114, SEQ ID NO:152, SEQ ID NO:78, SEQ ID NO:66, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:42, SEQ ID NO:154, SEQ ID NO:121, SEQ ID NO:125, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:146, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:80, SEQ ID NO:82, NO:142, SEQ ID NO:138, SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84 and SEQ ID NO:45; an aminopeptidase gene comprising a nucleic acid molecule selected from the group consisting of SEQ ID NO:1l0 and SEQ ID NO:112; and a cysteine protease gene comprising a nucleic acid molecule selected from the group consisting of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, WO 97/40058 WO 9740058PCTIUS97/06121 291 SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:76 and SEQ ID NO:94.

2. An isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO:l0, SEQ ID NO: 13, SEQ ID NO:l6, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:l6l, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:lll, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:

3. An isolated protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid WO 97/40058 PCT/US97/06121 sequence selected frc SEQ ID NO:13, 2 om the 92 group consisting of NO:22, NO:33, NO:44, NO: 70, NO:96, NO:129, NO:137, NO:135, NO: 151, NO:139, NO:68, SEQ ID NO:24, SEQ ID NO:36, SEQ ID NO:67, SEQ ID NO:71, SEQ ID NO:115, SEQ ID NO:153, SEQ ID NO:79, SEQ ID NO:67, SEQ ID NO:81, SEQ ID NO:145, SEQ ID NO:163, SEQ ID NO:107, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:111, SEQ ID NO:113, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID

4. A therapeutic composition that, when administered to an animal, reduces flea infestation, said therapeutic composition comprising a protective compound selected from the group consisting of: an isolated protein or mimetope thereof encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID WO 97/40058 WO 9740058PCTIUS97/0612-1. 293 NO: 27, NO: 3 8, NO: 68, NO: 72, NO: 126, NO: 157, NO: 159, NO: 147, NO: 8 3, NO: 141, NO: 162 NO: 111 SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO: 149, SEQ ID NO: 151, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO: 123, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 85, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:36, SEQ ID NO:67, SEQ ID NO:71, 3EQ ID NO:115, SEQ ID NO:153, SEQ ID NO:79, SEQ ID NO:67, SEQ ID NO:81, SEQ ID NO:145, SEQ ID NO:163, SEQ ID NO:107, SEQ ID NO:2, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO: 95; an isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:18, SEQ ID SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO: 35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO: 42, SEQ ID NO:43, SEQ ID SEQ ID NO:120, SEQ ID NO:130, SEQ ID NO:154, SEQ ID NO:116, SEQ ID NO:117, SEQ ID NO:127, SEQ ID NO:121, SEQ ID NO:131, SEQ ID NO:155, SEQ ID NO:114, SEQ ID NO:125, SEQ ID NO:118, SEQ ID NO:128, SEQ ID NO:152, SEQ ID NO:156, SEQ ID WO 97/40058 WO 9740058PCTUS97/06121 294 NO:160, SEQ ID NO:136, SEQ ID NO:78, SEQ ID NO:158, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:66, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:138, SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84, SEQ ID NO:l10, SEQ ID NO:112, SEQ ID NO:76, SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and SEQ ID NO:94; an isolated antibody that selectively binds to a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 13, NO: 22, NO: 33, NO: 4 4, NO: 71, NO: 115, NO: 153, NO: 79, NO: 67, NO: 8 1, NO: 14 5, NO: 163, SEQ ID NO:24, SEQ ID NO:36, SEQ ID NO: 67, SEQ ID NO: 72, SEQ ID NO:126, SEQ ID NO:157, ;EQ ID NO:159, 3EQ ID NO:147, SEQ ID NO:83, SEQ ID NO:141, SEQ ID NO:162 SEQ ID NO: 16, SEQ ID NO: 27, SEQ ID NO: 38, SEQ ID NO: 68, SEQ ID NO: 73, SEQ ID NO:119, SEQ ID NO:161, SEQ ID NO: 133, SEQ ID NO: 149, SEQ ID NO: 143, SEQ ID NO: 123, :SEQ ID NO: 69, SEQ ID NO:19, SEQ ID NO:30, SEQ ID NO:41, SEQ ID NO: 6 SEQ ID NO:96, SEQ ID NO: 129, SEQ ID NO:137, SEQ ID NO:135, SEQ ID NO:151, SEQ ID NO:139, SEQ ID NO:68, SEQ ID NO:85, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SQI SEQ ID 295 NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92, and SEQ ID NO:95; and a mixture thereof. A method to reduce flea infestation comprisig treating an animal with a therapeutic composition comprising as a protective compound selected from the group go.* 85257 6.10.2000 WO 97/40058 WO 9740058PCTIUS97/06121 296 consisting of: an isolated protein or mimetope thereof encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID SEQ ID NO:33, SEQ ID NO:36, NO: 41, SEQ ID NO: 44, SEQ ID NO: 67, NO: 69, SEQ ID NO:70, SEQ ID NO:71, NO: 73, SEQ ID NO:96, SEQ ID NO:115, NO:119, SEQ ID NO:129, SEQ ID NO:153, NO:161, SEQ ID NO:137, SEQ ID NO:79, NO: 133, SEQ ID NO:135, SEQ ID NO:67, NO:149, SEQ ID NO:151, SEQ ID NO:81, NO: 143, SEQ ID NO:139, SEQ ID NO:145, NO: 123, SEQ ID NO: 68, SEQ ID NO: 163, NO: 69, SEQ ID NO: 85, SEQ ID NO: 107, NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:38, SEQ ID NO:68, SEQ ID NO:72, SEQ ID NO:126, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:147, SEQ ID NO:83, SEQ ID NO:141, SEQ ID NO:162, SEQ ID NO:111, SEQ ID NO: 5, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95; an isolated nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:26, SEQ ID WO 97/40058 PCT/US97/06121 NO:28, NO: 34, NO: 40, NO: 120, NO:117, NO: 155, NO: 128, NO: 136, NO: 134, NO:150, NO: 138, NO: 84, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:42, SEQ ID NO:130, SEQ ID NO:127, SEQ ID NO:114, SEQ ID NO:152, SEQ ID NO:78, SEQ ID NO:66, SEQ ID NO:80, SEQ ID NO:144, SEQ ID NO:110, 297 SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:43, SEQ ID NO:154, SEQ ID NO:121, SEQ ID NO:125, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:146, SEQ ID NO:82, SEQ ID NO:140, SEQ ID NO:112, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ NO: 32, NO: 39, NO: 45, NO: 116, NO: 131, NO: 118, NO: 160, NO: 132, NO: 148, NO: 142, NO: 122, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ ID NO:76, SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and SEQ ID NO:94; an isolated antibody that selectively binds to a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID WO 97/40058 WO 9740058PCTr1US97/0612-1 298 NO: 157, SEQ ID NO: 161, SEQ ID NO: 137, SEQ ID NO: 79, SEQ ID NO: 1S9, SEQ ID NO: 133, SEQ ID NO: 135, SEQ ID NO: 67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO: 5, SEQ ID NO: 8, SEQ ID NO: 89, SEQ ID NO: 92 and SEQ ID an inhibitor of protease activity identified by its ability to inhibit the activity of a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:16l, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID WO 97/40058 WO 9740058PCTIUS97/06121 299 NO: 69, SEQ ID NO: 85, SEQ ID NO: 107, SEQ ID NO: 1111 SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95; and a mixture thereof.

6. A method to produce a f lea protease protein, said method comprising culturing a cell capable of expressing said protein, said protein being encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a gene comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO:9, SEQ ID NO: 11, NO: 17, NO: 23, NO: 2 9, NO: 35, NO: 42, NO: 130, NO: 127, NO: 114, NO: 152, NO: 7 8, NO: 6 6, NO: 8 0, SEQ ID NO:12, SEQ ID NO:18, SEQ ID NO:25r SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:43, SEQ ID NO:154, SEQ ID NO:121, SEQ ID NO:125, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO:146, SEQ ID NO:82, SEQ ID NO: 14, SEQ ID NO:15, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ NO: 20, SEQ NO:26, SEQ NO: 32, SEQ NO: 39, SEQ NO: 45, SEQ NO: 116, SEQ NO: 131, SEQ NO: 118, SEQ NO: 160, SEQ NO: 132, SEQ NO: 148, SEQ NO: 142, SEQ ID NO:21, ID NO: 28, ID NO: 34, ID NO:40, ID NO: 120, ID NO: 117, ID NO: 155, ID NO: 128, ID NO: 136, ID NO: 134, ID NO: 150, ID NO: 138, SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID NO:144, SEQ ID NO:140, SEQ ID NO:122, SEQ ID NO:84, NO:110, SEQ ID NO:112, SEQ ID NO:76, SEQ ID NO:1, WO 97/40058 WO 9740058PCTIUS97/06121 300 NO:3, SEQ ID NO:4, -SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO: 90, SEQ ID NO: 91, SEQ ID NO: 93 and SEQ ID NO: 94.

7. A method to identify a compound capable of inhibiting flea protease activity, said method comprising: contacting an isolated flea protease protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO:l10, SEQ ID NO: 13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID *NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:Ji51, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO: 163, SEQ ID NO: 162, SEQ ID NO: 69, SEQ ID NO: 85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95 with a putative inhibitory compound under conditions in which, in the absence of said compound, said protein has proteolytic activity; and 301 determining if said putative inhibitory compound inhibits said activity.

8. A test kit to identify a compound capable of inhibiting flea protease activity, said test kit comprising isolated flea protease protein comprising an amino, acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO: 123, SEQ ID NO:68, SEQ ID NO: 163, SEQ ID NO: 162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO: 107, SEQ ID NO: 111, SEQ ID NO: 113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, @:SEQ ID NO:92 and SEQ ID NO:95, and a means for determininig the extent of inhibition of said activity in the pre 'sence of a putative inhibitory compound.

9. An isolated flea protease protein that cleaves an inmunoglobulin when said flea protease protein is incubated in the presence of said imnmunoglobulin *:in about 100 miucroliters of about 0.2M Tris-HCl for about 18 hours at about 37*C, wherein said flea protease protein: binds to p-aminobenzamidine cross-linked to sepharose beads in a buffer containing 50 mM Tris-HCl (pH 100 mM CaCl 2 and 400 mM NaCl; and binds to a polypropylaspartamide hydrophobic interaction :chromatography (HIC) column in buffer containing 0.1 M KPO 4 (p H and 2 M (NH 4 2 S0 4 SujmM0110781806018525? 6.10.2000 A method to identify a compound capable of inhibiting flea imn-unoglobulin proteinase protein activity, said method comprising: contacting an isolated flea imrmunoglobubin proteinase protein with a putative inhibitory compound under conditions which, in the absence of said compound, said protein has imimunoglobulin proteinase activity; and determining if said putative inhibitory compound inhibits said activity.

11. The invention of Claim 1, wherein said serine protease gene comprises a nucleic acid sequence that encodes an amino acid sequence selected from the group consisting of SEQ ID NO:1O, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO: 19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID 110781806v1 85257 6.10.2000 WO 97/40058 PCT/US97/06121. 303 NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69 and SEQ ID NO:85, and wherein said aminopeptidase gene comprises a nucleic acid sequence that encodes an amino acid sequence selected from the group consisting of SEQ ID NO:107, SEQ ID NO:ll and SEQ ID NO:113, and wherein said cysteine protease gene comprises a nucleic acid sequence that encodes an amino acid sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92, SEQ ID NO:95 and SEQ ID NO:77.

12. The invention of Claims 1 or 2, wherein said nucleic acid molecule comprises a nucleic acid sequence that encodes a flea protease protein selected from the group consisting of a larval serine protease protein, an adult serine protease protein, a larval aminopeptidase protein, an adult aminopeptidase protein, a larval cysteine protease protein and an adult cysteine protease protein.

13. The invention of Claims 1 or 2, wherein said nucleic acid molecule is a flea nucleic acid molecule.

14. The invention of Claims 1 or 2, wherein said nucleic acid molecule is selected from the group consisting of Ctenocephalides, Ceraltophyllus, Diamanus, Echidnophaga, Nosopsyllus, Pulex, Tunga, Oropsylla, Orchopeus and Xenopsylla nucleic acid molecules.

15. The invention of Claims 1 or 2, wherein said nucleic acid molecule is selected from the group consisting WO 97/40058 WO 7/4058PCT/US97/06121. 304 of Ctenocephalides felis, Ctenocephalides canis, Ceratophyllus pulicidae, Pulex irritans, Oropsylla (Thrassis) bacchi, Oropsylla (Diamanus) montana, Orchopeus howardi, Xenopsylla cheopis and Pulex simulans nucleic acid molecules.

16. The invention of Claims 1 or 2, wherein said nucleic acid molecule comprises a Ctenocephalides fells nucleic acid molecule.

17. The invention of Claim 1, wherein said nucleic acid molecule hybridizes under stringent hybridization conditions with a nucleic acid molecule selected from the group consisting of nfSPl8 5 3 4 nfSP18 77 5 nfSP18 2 25 1 nfSP24 410 nfSP24 089 nfSP24 77 4 nfSP24 7 11 nfSP2871 nS2923' nfSP32 93 3 nfSP32, 24 nfSP32 699 nfSP33 426 nfSP33 77 nfSP331 8 94 nfSP33 12 00 nfSP33 7 26 nfSP40 84 1 nfSP4O 71 7 nfSP59 06 nfSPll 3 0 7 nfSP8 51 nfSP8 4 3 6 nfSP12 758 nfSP266 0 nfSP27 3. nfSP23 4. nfSP34 y~ nfSP361 9 7 nfSP38 34 1, nfSP37 2 6 1, nfSP39 267 nfSP29 612 nfSP3O 64 1, nfSP31 62 6 nfSP32 433 nfSP15, 1 5 nfSPl9 55 nfSP25, 64 nfSP2l 5 95 nfAP2 3 3 nfAP2 53 nfCP 5 7 3 and nfCP1 1109 l8. The invention of Claim 1, wherein said nucleic acid molecule comprises a nucleic acid molecule selected from the group consisting of nfSPl8 5 34 nfSP8 7 7 5 nfSP18 22 5 nfSP24 410 nfSP24, 089 nfSP24 77 4 nfSP24 711 nfSP28 711 nfSP28 923 nfSP32 9 3 nfSP32 924 nfSP32 699 nfSP33 426 nfSP33 7 7 8 nfSP33113 9 4 nfSP331 2 00 nfSP33-7 2 6 nfSP40 84 1 nfSP40 7 17 nfSP5 8 0 6 nfSPll 30 7 nf SP8 5 1 5 nfSP8 4 3 6 nfSP12 758 nfSP26 610 nfSP27 3 6 nfSP23 42 3 WO 97/40058 PTU9/62 PCTIUS97/06121 305 nfSP34 390 nfSP36 19 nfSP38 3 41 nfSP37 2 1 nfSP39 2 7 nifSP29 712 nfSP3O 6 4 1 nfSP3l 6 2 6 nfSP32 43 3 nfSPl5 15 nfSP9 5 5 nfSP25,, 4 nfSP2l 5 95 nfAP2 3 3 nfAP2 537 nfCPl 7 3 and nfCP 1109

19. The invention of Claims 1 or 2, wherein said nucleic acid molecule is selected from the group consisting of: a nucleic acid molecule comprising a nucleic acid sequence that encodes a protein having an amino acid sequence selected from the group consisting of SEQ ID SEQ ID NO:13, NO: 22, NO: 33, NO: 4 4, NO: 7 0, NO: 96, NO: 129, NO: 137, NO: 135, NO: 151, NO: 13 9, NO: 68, SEQ ID NO:24, SEQ ID NO:36, SEQ ID NO:67, SEQ ID NO:71, SEQ ID NO:115, SEQ ID NO: 153, SEQ ID NO:79, SEQ ID NO:67, SEQ ID NO:81, SEQ ID NO:145, SEQ ID NO:163, SEQ ID NO:16, SEQ ID NO:27, SEQ ID NO:38, SEQ ID NO:68, SEQ ID NO: 72, SEQ ID NO:126, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:147, SEQ ID NO:83, SEQ ID NO:141, SEQ ID NO:162, SEQ ID NO:19, SEQ ID NO:30, SEQ ID NO:41, SEQ ID NO:69, SEQ ID NO:73, SEQ ID NO:119, SEQ ID NO:161, SEQ ID NO:133, SEQ ID NO:149, SEQ ID NO: 143, SEQ ID NO:123, SEQ ID NO:69, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ ID SEQ ID SEQ ID NO:107, SEQ ID NO:lll, SEQ ID NO:113, NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO: 92 and SEQ ID NO: 95; and a nucleic acid molecule comprising an allelic variant of a nucleic acid molecule encoding any of said amino acid sequences. WO 97/40058 PCT/US97/06121 306 The invention of Claim 1, wherein said nucleic acid molecule is selected from the group consisting of a nucleic acid molecule comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:15, NO: 17, NO: 23, NO: 29, NO: 35, NO:42, NO: 130, NO: 127, NO: 114, NO: 152, NO:78, NO: 66, NO: 80, NO: 144, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:43 and SEQ ID NO:45, SEQ ID NO:154, SEQ ID NO:116, SEQ ID NO:121, SEQ ID NO:131, SEQ ID NO:125, SEQ ID NO:118, SEQ ID NO:156, SEQ ID NO:160, SEQ ID NO:158, SEQ ID NO:132, SEQ ID NO:146, SEQ ID NO:148, SEQ ID NO:82, SEQ ID NO:142, SEQ ID NO:140, SEQ ID NO:122 SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ NO:21, NO:28, NO:34, NO:40, NO:120, NO: 117, NO:155, NO:128, NO:136, NO:134, NO: 150, NO:138, SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ SEQ ID NO:84, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:76, SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and SEQ ID NO:94; and a nucleic acid molecule comprising an allelic variant of a nucleic acid molecule having any of said nucleic acid sequences.

21. The invention of Claim 1, wherein said nucleic acid molecule comprises an oligonucleotide. WO 97/40058 PCT/US97/06121 307

22. The invention of Claim 3, wherein said protein, when administered to an animal elicits an immune response against a flea protease selected from the group consisting of a flea serine protease, a flea aminopeptidase, and a flea cysteine protease.

23. The invention of Claim 12, wherein said protein, when administered to an animal elicits an immune response against a flea protease selected from the group consisting of a flea serine protease, a flea aminopeptidase, and a flea cysteine protease.

24. A recombinant molecule comprising a nucleic acid molecule as set forth in Claims 1 or 2 operatively linked to a transcription control sequence. A recombinant virus comprising a nucleic acid molecule as set forth in Claims 1 or 2.

26. A recombinant cell comprising a nucleic acid molecule as set forth in Claims 1 or 2, said cell being capable of expressing said nucleic acid molecule.

27. The invention of Claim 2, wherein said nucleic acid molecule hybridizes under stringent hybridization conditions with the complement of a nucleic acid sequence encoding said protein.

28. The invention of Claim 3, wherein said protein is selected from the group consisting of: a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:19, WO 97/40058 PCT/US97/06121 308 SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO:115, SEQ ID NO:126, SEQ-ID NO:119, SEQ ID NO:129, SEQ ID NO:153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID NO:95; and a protein encoded by an allelic variant of a nucleic acid molecule encoding a protein comprising any of said.amino acid sequences.

29. The invention of Claim 3, wherein said protein is used to identify an inhibitor selected from the group consisting of an inhibitor of flea serine protease activity, an inhibitor of aminopeptidase activity, and an inhibitor of flea cysteine protease activity. The invention of Claim 29, wherein said inhibitor, when administered to an animal, is capable of reducing flea infestation.

31. An isolated antibody that selectively binds to a protein as set forth in Claim 3. WO 97/40058 PCT/US97/06121 309

32. The invention of Claims 4, 5 or 52, further comprising a component selected from the group consisting of an excipient, an adjuvant, a carrier, and a mixture thereof.

33. The invention of Claims 4 or 5, wherein said composition comprises a controlled release formulation.

34. The invention of Claims 4 or 5, wherein said composition further comprises a compound that reduces flea burden by a method other than by reducing flea protease activity. The invention of Claim 5, wherein said animal is selected from the group consisting of adult fleas, flea larvae and animals susceptible to flea infestation.

36. The invention of Claim 5, wherein larval flea infestation is reduced by flea larvae ingesting adult flea feces comprising said therapeutic composition.

37. The invention of Claim 5, wherein larval flea infestation is reduced by flea larvae ingesting adult flea feces, said feces comprising anti-flea protease antibodies elicited in a host animal in response to administration of one or more of said isolated flea protease proteins, said adult flea having fed from said host animal after said administration.

38. The invention of Claim 5, wherein said animal is selected from the group consisting of mammals and birds. WO 97/40058 PCT/US97/06121 310

39. The invention of Claim 5, wherein said animal is selected from the group consisting of cats and dogs. The invention of Claim 5, wherein said fleas are of a genus selected from the group consisting of Ctenocephalides, Cyopsyllus, Diamanus, Echidnophaga, Nosopsyllus, Pulex, Tunga, and Xenopsylla.

41. The invention of Claim 5, wherein said fleas are of a species selected from the group consisting of Ctenocephalides felis, Ctenocephalides canis, Pulex irritans and Pulex simulans.

42. The invention of Claim 5, wherein said composition further comprises a compound that reduces flea burden by a method other than by reducing flea protease activity.

43. The invention of Claim 9, wherein said protease protein cleaves immunoglobulin heavy chain.

44. The invention of Claim 9, wherein said protease protein cleaves the hinge region of an immunoglobulin heavy chain.

45. The invention of Claim 9, wherein the mature form of said protein has a molecular weight ranging from about kD to about 35 kD.

46. The invention of Claim 45, wherein the mature form of said protein has a molecular weight of about 31 kD.

47. The invention of Claim 9, wherein said protein comprises an amino acid sequence selected from the group WO 97/40058 PCT/US97/06121 311 consisting of SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:96.

48. The invention of Claim 9, wherein said flea protease protein comprises an amino acid sequence encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a flea serine protease gene that cleaves immunoglobulins.

49. The invention of Claim 48, wherein said serine protease gene comprises nucleic acid sequence SEQ ID NO:66. The invention of Claim 48, wherein said serine protease gene comprises a nucleic acid sequence that encodes an amino acid sequence selected from the group consisting of SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73 and SEQ ID NO:96.

51. The invention of Claim 9, wherein said protein is encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule selected from the group consisting of nfSP28 711 and nfSP28923.

52. The invention of Claim 9, wherein said protein comprises an amino acid sequence encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with nucleic acid molecule SEQ ID NO:66. WO 97/40058 PCT/US97/06121 312

53. The invention of Claim 9, wherein said protein comprises an amino acid sequence encoded by a nucleic acid molecule selected from the group consisting of nfSP28 71 1 and nfSP28923.

54. The invention of Claim 9, wherein said protein is produced by a method comprising culturing a cell expressing nucleic acid molecule nfSP28711. A therapeutic composition comprising an excipient and a protective compound selected from the group consisting of a nucleic acid molecule of Claim 48 and a protein of Claim 47.

56. The invention of Claim 55, wherein said composition further comprises a component selected from the group consisting of an adjuvant, a carrier, and a mixture thereof.

57. The invention of Claim 55, wherein said composition further comprises a compound that reduces hematophagous ectoparasite burden by a method other than by reducing flea immunoglobulin proteinase activity.

58. The invention of Claim 55, wherein said composition is administered to an animal to reduce ectoparasite infestation of said animal. 31 2a

59. A therapeutic composition when used to reduce flea infestation of an animal, said therapeutic composition comprising an inhibitor of protease activity identified by its ability to inhibit the activity of a protein encoded by a nucleic acid molecule that hybridizes under stringent hybridization conditions with a nucleic acid molecule having a nucleic acid sequence encoding a protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO: 19, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27, SEQ ID NO:30, SEQ ID NO:33, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:41, SEQ ID NO:44, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:96, SEQ ID NO: 115, SEQ ID NO: 126, SEQ ID NO: 119, SEQ ID NO: 129, SEQ ID NO: 153, SEQ ID NO:157, SEQ ID NO:161, SEQ ID NO:137, SEQ ID NO:79, SEQ ID NO:159, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:67, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:15l, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:143, SEQ ID NO:139, SEQ ID NO:145, SEQ ID NO:141, SEQ ID NO:123, SEQ ID NO:68, SEQ ID NO:163, SEQ ID NO:162, SEQ ID NO:69, SEQ ID NO:85, SEQ ID NO:107, SEQ ID NO:111, SEQ ID NO:113, SEQ ID NO:77, SEQ ID NO:2, SEQ ID SEQ ID NO:8, SEQ ID NO:89, SEQ ID NO:92 and SEQ ID 85257 6.10.2000