AU603883B2

AU603883B2 - Genetic engineering process for the preparation of polypeptides

Info

Publication number: AU603883B2
Application number: AU14469/88A
Authority: AU
Inventors: Wolfgang Becker; Hans Willi Jansen; Waldemar Wetekam
Original assignee: Hoechst AG
Current assignee: Hoechst AG
Priority date: 1987-04-11
Filing date: 1988-04-11
Publication date: 1990-11-29
Anticipated expiration: 2008-04-11
Also published as: FI881617A0; GR3007417T3; JPS63263087A; IL86023A0; EP0286956A2; DK196188D0; AU1446988A; NO881548D0; PT87211B; KR880012758A; EP0286956B1; DE3877035D1; DK196188A; EP0286956A3; NO881548L; IL86023A; PT87211A; DE3805150A1; FI881617A7; NZ224184A

Description

b 4 Form COMMONWEALTH OF AUSTRALIA PATENTS ACT 1952-69 COMPLETE SPECIFICATION

-IRIGINAL)

class Application Number: Lodged.

Int. Class Co mplete Specification Lodged: Accepted: Published: N)iority: a IRated Art.

I his dmismnent conl'liins th1 iendnients made unde.r Section 4-9and con Ixt ftu 1 Printing.

Name of Applicant. HOECH4ST AKTIIENGESELLSCHAFT Addressof Applicant: 45 Bruningstrasse, D-6230 Frankfurt/Main 80, Federal Republic of Germany SDActual Inventor: Address for Service WALDEMAR WETEKAM, HANS WILLI JANSEN AND WOLFGkNG BECKER EDWD. WATERS SONS, 50 QUEEN STREET, MELBOURNE, AUSTRALIA, 3000.

Complete Specification for the invention entitled: GENETIC ENGINEERING PROCESS FOR THE PREPARATION OF POLYP EPT IDES The following statement is a full description of this invention, including the best method of performing it known to,. us To the Oommies±Qnor or Patanto AKTIENGESELLSI' 1AFT PAT 510 Prokuriot Authorized Bigna* y ppa. Ilenbruck i.V. Lapice VOECHST AKTIENGESELLSCHAFT HOE 87/F 110J Dr.KL/mu Specification Genetic engine&ring process for the preparation of polypeptides In the preparation, by genetic manipulation, of polypeptides in bacteria, the structural gene for the desired polypeptide is frequently coupled in the reading frame to the gene for the polypeptide B-galactosidase which is intrinsic to the bacteria. The bacterium then produces a S fusion protein in which the amino terminus of the desired c oe, .polypeptide is bonded to the carboxyl terminus of the B- *galactosidase.

o* j 0" In the chemical or enzymatic elimination of the -galactosidase portion the latter is broken down into many fragments. This results in fragments of the B-galactosidase which, in respect of the properties relevant for fraction- 20 ation, such as molecular weight, are similar to the desired protein. Thus, this makes the working up and isolation of the desired polypeptide very difficult, and re- Sduces the yield. There is a description, in Broker, Gene Anal. Techn. 3 (1986) 53-57, of how fusion proteins having shortened B-galactosidase portions can be prepared. It is mentioned as an advantage of these fusion proteins that cleavage with cyanogen bromide yields fewer cleavage products, which facilitates purification. Although this reduces the number of cleavage products, it does not eliminate the abovementioned disadvantage because this known process also results in fragments which are very similar to the desired product in the properties relevant for fractionation.

The disadvantages of the known processes are avoided according to the invention by replacing, in whole or in part, codons for methionine and/or arginine and/or cysteine in the gene for 8-gaactosidase, or for a fragment of B-galactosidase, by cadons of other amino acids.

_I

In this way it is possible to "tailor" the fragments of the B-galactosidase portion in such a way that there are no problems with the removal of the desired polypeptide.

Hence the invention relates to a process for the prepara- 1i tion of a genetically codable polypeptide, with the structural gene for this polypeptide being coupled in the correct reading frame via the gene for B-gaLactosidase, or a fragment of B-galactosidase, to a regulator region, introduction of this gene structure into a bacterium, expression therein of an insoluble fusion protein, isolation thereof after cell disruption, and the desired polypeptide being obtained by chemical or enzymatic cleavage, which comprises codons for methionine and/or arginine and/or 15 cysteine in the gene for B-galactosidase, or for the fragment of B-galactosidase,.being replaced, in whole or in i part, by codons of other amino acids.

Further aspects of the invention and its preferred embodi- S. 20 ments are explained hereinafter and defined in the patent 4 4 claims.

t 44 It has been found that constructions having a B-galactosidase fragment with more than 250 amino acids, but significantly less than the total sequence of B-galactosidase, result in insoluble fusion proteins which can easily be isolated. This B-galactosidase portion advantageously has about 300 to about 800, preferably about 320 to about 650, amino acids and contains an amino-terminal and/or carboxylterminal portion of B-galactosidase.

The gene for B-galactosidase, or the B-galactosidase fragment, is bonded in the correct reading frame to a regulator region and, if necessary via an adaptor, to the structural gene for the desired polypeptide.

This adaptor, which can also be omitted where appropriate, advantageously codes for an amino acid (or a group of amino acids) which is Located in front of the amino i Is -3terminus of the desired polypeptide and permits the easy chemical or enzymatic separation thereof from the Bgalactosidase portion. If, for example, the desired polypeptide contains no methionine or has been modified by genetic manipulation in such a way that it contains no i methionine, then it is advantageous to choose as amino acid in front of the amino terminus of the desired polypeptide methionine, after which the desired polypeptide can be separated from the B-galactosidase portion by cleavage witn cyanogen chloride or cyanogen bromide. If the desired polypeptide is not inactivated by trypsin, then it t is possible to program the amino acid arginine in front of the amino terminus of the desired polypeptide, after which the desired polypeptide is obtained by trypsin cleavage.

S. 15 Of course, it is also possible in this case to eliminate any trypsin cleavage site present in the desired poly- S peptide by genetic manipulation.

Shortened 0-galactosidase constructions are generally preferred because the capacity of the host cell for the production of the foreign protein is limited, and thus a larger portion of the fusion protein is available for the desired protein. Hence, this by itself results in a crucial improvement in the yield. Another advantage is that not only can the insoluble fusion proteins be readily isolated after cell disruption, they are also not degraded to a noteworthy extent by proteases intrinsic to the host.

A Longer induction period is thus possible, which results in greater accumulation of the foreign protein in the bacterium. Furthermore, owing to the easier working up, there are lower losses of yield on isolation of the desired protein so that, overall, significantly higher yields are obtained than result by the known processes.

The figures illustrate the preferred embodiment of the invention with shortened 8-galactosidase constructions, no account being taken of the extent to which there has been a replacement of "undesired" by "desired" codons.

'P

-4- Fig. 1 shows constructions of shortened 0-galactosidase sequences without (A 1. and with (B 1. Linkers.

Fig. 2 shows the construction of the plasmid pWZRI (a derivative of the plasmid pBR 322 with B-galactosidase fragments from the plasmids pUC 9 and pUR 270).

Figs. 3A 3D show the construction of the plasmid pWI 6 having monkey proinsulin DNA.

Fig. 4 shows the construction of the plasmid pWZIP dMdC A 2 from Fig. 1.

o* 15 Fig. 5 shows the construction of the plasmid pWZPWB1 dMdC from pWI 6 (Fig. 3D) and pWZIP dMdC (Fig. pWZPWB1 S°dMdC contains a polylinker (MCS) which permits the insertion of genes for expression of the desired polypeptides.

20 The regulator region can be natural, especially intrinsic t to the bacteria, chemically synthesized or a hybrid region, for example of the fusion promoter tac. The regulator regions additionally contain an operator, for example the Lac operator, and 6 to 14 nucleotides upstream of the methionine codon of the B-galactosidase fragments a ribosome binding site.

The S-galactosidase fragment advantageously comprises a fusion of an amino-terminal and carboxyl-terminal partsequence. This results in a considerable reduction in the B-galactosidase portion in the fusion protein. In addition, this construction allows the replacement of various regulator systems without special effort. However, it is also possible to use exclusively amino-terminal or predominantly carboxyl-terminal sequences of B-galactosidase.

It is possible to use natural restriction enzyme cleavage sites for these shortened B-galactosidase constructions.

However, it is also possible to employ constructions having chemically synthesized linkers or adaptors which i 9 guarantee a correct reading frame free of stop codons and, where appropriate, contain an ATG start codon. Figure 1 A shows B-galactosidase fragments in which natural restriction enzyme cleavage sites have been used, and Figure 1 B shows the like where a linker is employed.

Further modifications of the B-galactosidase fragment may prove advantageous, singly or in combination, in particular cases.

In the case of polypeptides which are to be separated from the B-galactosidase fragment by cleavage with 4, cyanogen chloride or cyanogen bromide, the purification of the former is facilitated if, by targeted in vitro mutagenesis, some, or advantageously all, of the codons of the interfering methionine residues are replaced by codons for other amino acids, preferably leucine or isoleucine.

In a fusion protein modified in this way, the result of cleavage is, besides the desired polypeptide, a reduced number of B-galactosidase cleavage fragments which can be ties chosen such that they can easily be separated from the desired polypeptide on the basis of their size and/or charge.

In the case of polypeptides which are to be separated from the B-galactosidase fragment by acid cleavage of the polypeptide bond between aspartic acid and proline, it may be advantageous to modify the appropriate codons in the BgaLactosidase fragment by targeted in vitro mutagenesis in such a way that acid cleavage is no longer possible at these points, preferably by converting the codon for aspartic acid into a codon for glutamic acid.

In the case of polypeptides which are to be separated from the B-galactosidase fragment by cleavage with trypsin, it is possible to modify, by targeted in vitro mutagenesis, codons for arginine and/or Lysine in such a way that the

I

6 8-galactosidase fragments resulting after cleavage with trypsin can easily be separated from the desired polypeptide on the basis of their size and/or charge.

In the case of potypeptides which contain no cysteine, it is possible to insert a codon for the amino acid cysteine between the DNA coding for the S-galactosidase fragment and the DNA coding for the desired polypeptide. It is then possible, by subsequent specific S-cyanylation, to cleave the desired polypeptide off from the 0-galactosidase portion.

0 0 In the case of polypeptides for which the formation of di- 0t. o 15 generally proves beneficial to convert the codons for cysoo teine which are.present in the 0-galactosidase fragment, o j by targeted in vitro mutagenesis, into codons for other o amino acids, preferably serine, in order in this way to avoid any possible formation of wronq disulfide bridges 0o 20 between the 8-galactosidase fragment and the polypeptide.

too# The adaptor between the 8-galactosidase fragment and the structural gene for the desired polypeptide, which adaptor can be omitted in favorable cases, codes immediately in front of the amino terminus of the desired polypeptide for an amino acid or a sequence of amino acids which allow S' easy separation of the desired polypeptide from the 8galactosidase portion. As already mentioned, this amino acid can be methionine, which allows straightforward cleavage with cyanogen bromide, as long as the desired polypeptide contains no methionine or the corresponding codons have been modified by genetic manipulation. One example of an adaptor of this type has the following nucleotide sequence: AAT TAT GAA TTC GCA ATG (Eco RI) TA CTT AAG CGT TAC One example of an adaptor which in the reading frame -7encodes a trypsin cleavage site has the following sequence: AAT TAT GAA TTC GCA AGA (Eco RI) TA CTT AAG CGT TCT An additional facilitation of the various modalities of cleavage off chemical or enzymatic can be achieved by steric separation of the desired polypeptide from the 8galactosidase portion. For this purpose, the codons for a poly(amino acid) are inserted, via a special chemically synthesized adaptor, between the B-galactosidase portion ne, and the polypeptide. In general, it is possible to use as oo amino acids, in view of the different structuring of this poly(dmino acid) "arm", all genetically codable amino I i 15 acids, for example small uncharged amino acids such as glycine, alanine, serine or proline, or charged amino acids o such as aspartic acid and glutamic acid on the one hand, or lysine and arginine on the other hand. Inc poly(amino acid) chain expediently encompasses 5 to 30, preferably 20 10 to 24, in particular 15 to 20, amino acids. Depending on the choice of the poly(amino acids), it is possible to achieve direct folding-back of the desired gene product in conjunction with the 8-galactosidase portion.

The structural gene for the desired polypeptide can be obtained in a manner known per se from natural sources or chemically synthesized. Reference may be made to EP-B 0,032,675 as an example of the isolation of a gene from natural material. Chemically synthesized genes suited to the specific codon usage in the host cell, such as described in, for example, German Oftenlegungsschriften 3,327,007 (derivatives of growth hormone releasing factor), 3,328,793 (derivative of secretin), 3,409,966 (human y-interferon), 3,414,831 (derivatives of human yinterferon), 3,419,995 (interleukin-2 and derivatives) and 3,429,430 (hirudin derivatives) or proposed in (not prepublished) German Offenlegungsschrift 3,632,037 (calcitonin), are advantageous.

8 The incorporation of the gene structure composed of regulator region, 8-galactosidase gene fragment, where appropriate adaptor and structural gene into a suitable vector, the introduction of the hybrid vector obtained in this way into a suitable host cell, the cultivation of the host cells, the cell disruption, the isolation and cleavage of the fusion protein, and the isolation of the desired polypeptide are generally known. Reference may be made to the widely available textbooks and handbooks for this purpose.

Preferred vectors are plasmids, in particular the plasmids compatible with E. coli, such as pBR 322, pBR 325, pUC 8 00 and pUC 9, as well as other commercially available or geno ,o erally accessible plasmids. The preferred bacterial host 0000 0ooo 15 is E. coLi.

0 00 The invention is explained in detail in the examples which S follow.

20 Example 1 20 pg of the commercially available plasmid pUC 9 (cf.

Vieira et al., Gene 19 (1982) 259 268; The Molecular Biology Catalogue, Pharmacia P-L Biochemicals, 1984, appendix, p. 40)are subjected to double digestion with the restriction endonucleases Eco RI and Pvu I and a DNA fragment 123 base-pairs (Bp) in length is separated out by gel electrophoresis. This fragment encompasses part of the amino-terminal coding sequence of B-galactosidase.

To isolate the carboxyl-terminal portion of the 8-galactosidase gene up to the natural Eco RI cleavage site, 20 ig of the plasmid pUR 270 (RUther and MUller-Hill, EMBO J. 2 (1983) 1791-1794) are initially digested with Eco RI and then subjected to partial digestion with -the enzyme Pvu I.

A DNA fragment of 2895 Bp is separated out by electrophoresis on a 5% polyacrylamide gel and is isolated.

The amino-terminal and carboxyl-terminal DNA fragments of I I l IL 1 9 the 8-gaLactosidase gene are ligated together over the course of 6 hours at 16 0 C, and the ligation product is precipitated with ethanol. The precipitated and resuspended DNA is cut with Eco RI and is then fractionated again on a 5% polyacrylamide gel. The DNA fragment 3018 Bp in length is isolated from the gel by electroelution, and is Ligated into the Eco RI cleavage site of the plasmid pBR 322. The hybrid plasmid obtained in this way is called pWZ RI.

The reaction steps described above are shown in Figure 2.

The individual measures were carried out in a known manner (Maniatis et at., Molecular Cloning, Cold Spring Harbor 1982).

t The plasmid pWZ RI is transformed into E. coli, and is .amplified there and re-isolated. It is possible, by digestion with Eco RI, to cleave out the B-galactosidase gene fragment which is shortened at the amino and carboxyl termini, and to isolate it preparatively. The known restriction enzyme cleavage sites can be used for shortening the Sconstruction further and for insertion into suitable expression plasmids. Figure 1 A shows these types of shortening.

Figure 1 B shows constructions having a chemically synthesized linker.

Both the constructions in Figure 1 A and those in Figure 1 B are chosen such that the reading frame for the shortened B-galactosidase is directly joined to the reading frame for the desired carboxyl-terminal polypeptide. The chemically synthesized linkers in Figure 1 B can have any desired form but, of course, must guarantee, where appropriate, an ATG start codon and, of course, a reading frame without a stop codon, and have the desired restriction enzyme cleavage sites.

10 Example 2 A fusion protein composed of monkey proinsulin and a shortened and modified B-galactosidase can be obtained as foL- Lows: ug of the plasmid pWZ RI (Fig. 2) are cut with the restriction enzymes Eco RI and Pvu I and fractionated on a polyacrylamide gel. DNA fragments 123 and 1222 Bp in Length are isolated. Equimolar amounts of the two DNA fragments are subsequently Ligated together at 10°C for 6 hours, and are then digested with Eco RI. The Ligation mixture obtained in this way is fractionated on a polyacrylamide gel, and the DNA band having a Length of 15 1345 Bp is preparatively isolated. This DNA fragment is Ligated into the vector pBR 322 which has been opened with Eco RI and subsequently dephosphorylated. The plasmid obtained in this way is called pWZP RI.

To remove the 8 codons for methionine which pWZP RI contains (M1-M8) and the 6 codons for cysteine which are pre- 4. sent in this plasmid (C1-C6), 1 yg of DNA from pWZP RI is cleaved with Eco RI. The fragment 1345 Bp in size is ligated into the phage vector M13mpl9am(Patschinsky et al., J. Virol. 59 (1986) 341-353) which has been opened with Eco RI and subsequently dephosphorylated. The phage obtained after transfection of E. coli JM101 is cal- Led MWZPam. As a first step for removing the codons for methionine, targeted in vitro mutagenesis by the gapped duplex method (Kramer et al., Nucl. Acids Res. 12 (1984) 9441-9456) is carried out, with, because of the high efficiency of this method (70% on average), the 4 oligonucleotides dM5-dM8 (Tab. 1) being used as mutagenic primers.

The ssDNA of 12 of the resulting phages is sequenced, with, besides the normal 17mer primer, use also being made of dM7 and dC5 (Tab. 1) as primers for the sequencing. 2 of the 12 DNAs have all 4 desired mutations, i.e. the codons for M5-M8 have been altered to codons for isoleucine. These phages are called MWZP dM5,8. To remove the 11 remaining methionine codons, the RF DNA of MWZP dMS,8 is cleaved with Eco RI, and the DNA fragment 1345 Bp in size is cloned into dephosphorylated M13mpl9am. The phage obtained in this way is called MWZP dM5,8am. The ssDNA of this phage is subjected to another in vitro mutagenesis with dMl-dM4 as mutagenic primers. The ssDNA from 12 of the resulting phages is sequenced, and 3 of the 12 phages have all the desired mutations, i.e. the codons for M1-M3 have been altered to codons for leucine, and the codon for M4 has been altered to a codon for isoleucine. These phages are called MWZPdM. The plasmid pWZP dM is obtained by cloning the 1345 Bp Eco RI fragment from the RF form of o"oS° these phages into the dephosphorylated vector pBR 322.

B0 0 oo For the additional conversion of the codons for cysteine o 15 into codons for serine, the 1345 Bp Eco RI fragment from S0"" pWZP dM is isolated and cloned into the dephosphorylated I 4 n phage vector M13mpl9am. The ssDNA of the phage MWZP dMam 0 obtained in this way is subjected to in vitro mutagenesis with dC1-dc6 as mutagenic primers. The ssDNA of 24 of the resulting phages is sequenced, with 4 of the phage clones, which are called MWZP dMdC, proving to be the correct ones in which all the codons for cysteine have been converted into condons for serine. The RF DNA of these phages is cleaved with Eco RI, anc the 1345 Bp Eco RI fragment is cloned into the dephosphorylated vector pBR 322, resulting in the plasmid pWZP dMdC.

The fragment of the B-galactosidase gene which is obtained in this way and is flanked by Eco RI cleavage sites can be integrated as follows into a plasmid which connects together a bacterial regulator region and the gene for monkey proinsulin via an Eco RI cleavage site: pg of the plasmid pBR 322 are digested with the restriction endonucleases Eco RI and Pvu II, and subsequently a fill-in reaction is carried out with Klenow polymerase at the Eco RI cleavage site. Following fractionation by gel electrophoresis in a 5% polyacrylamide gel the plasmid fragment 2293 Bp in length can be obtained by i 12 electroelution (Figure 3 A).

Monkey preproinsulin DNA (Tab. 2; Wetekam et al., Gene 19 (1982) 179 183) integrated in the plasmid pBR 322 is isolated from the latter by digestion with the restriction endonucleases Hind III and Fsp I and is recloned into the plasmid pUC 9 as follows: the plasmid pUC 9 is cleaved with the enzyme Bam HI, a standard fill-in reaction with KLenow polymerase (large fragment) is carried out on the cleavage site, then cutting with the restriction enzyme Hind III is carried out, and the DNA is separated by gel electrophoresis in a 5% polyacrylamide gel from the other 1 DNA fragments. It was possible to integrate the isolated So, insulin DNA fragment about 1250 Bp in length into the S 15 opened plasmid. To remove the untranslated region and the ,Pit presequence, digestion is carried out with Hae III, and the fragment 143 Bp in length is digested under limiting enzyme conditions with Bal 31 to cleave off the last two nucleotides from the presequence. There is obtained in this way TTT as the first codon at the amino terminus, which represents phenylalanine as the first amino acid of the B chain.

An adaptor specific for Eco RI is now ligated onto this fragment in a blunt-end ligation reaction: a) 5' AAT TAT GAA TTC GCA GGA GGC GGG GGT GGC GGT GGG (Eco RI) TA CTT AAG CGT CCT CCG CCC CCA CCG CCA CCC GGC GGA GGT GGT GGC GGT GGA GGC GGT GGA GGC GGG CCG CCT CCA CCA CCG CCA CCT CCG CCA CCT CCG CCC GGT ATG CCA TAC i i i i -13 b) 5' AAT TAT GAA TTC GCA GGA GGC GGG GGT GGC GGT GGG (Eco RI) TA CTT AAG CGT CCT CCG CCC CCA CCG CCA CCC GGC GGA GGT GGT GGC GGT GGA GGC GGT GGA GGC GGG CCG CCT CCA CCA CCG CCA CCT CCG CCA CCT CCG CCC GGT AGA CCA TCT In order to prevent polymerization of the adaptors, they were used unphosphorylated in the ligation reaction. The adaptor a) has at the end a codon for methionine, and the adaptor b) has the codon for arginine. Thus, the gene product obtained by variant a) permits removal of the bacterial portion by cleavage with cyanogen bromide, whereas variant b) allows trypsin cleavage The ligation product is digested with Mbo II. Fractionation by gel electrophoresis results in a DNA 'fragment which is 1J9 Bp in length and has the information for amino acids Nos. 1 to 21 of the B chain.

The gene for the remaining information of the proinsulin molecule (including a G-C sequence from the cloning, and 21 Bp from pBR 322 following the stop codon) is obtained from the pUC 9 plasmid with the complete monkey preproinsulin information by digestion with Mbo II/Sma I and isolation of a DNA fragment about 240 Bp in length. The correct ligation product of about 380 Bp in length (including the adaptor of 78 Bp) is obtained by ligation of the two proinsulin fragments. The proinsulin DNA fragment constructed in this way can now be ligated together with a regulator region via the Eco RI-negative cleavage site.

The entire sequence of reactions is shown in Fig. 3 B, in which A, B and C denote the DNA for the relevant peptide 14 chains of the proinsulin molecule, Ad denotes the (dephosphorylated) adaptor (a or b) and Prae is the DNA for the presequence of monkey preproinsulin.

Example 3 A chemically synthesized regulator region composed of a recognition sequence for Bam HI, the Lac operator, a bacterial promoter and a ribosome binding site (RB) with an ATG start codon 6 to 14 nucLeotides away from the RB, and a subsequent recognition sequence for Eco RI (Figure 3 C) is Ligated via the common Eco RI overlap region with the proinsulin gene fragment obtained in the previous example.

Following a double digestion with Sma I/ Bai1 HI and a 15 fill-in reaction of the Bam HI cleavage site with the list KLenow fragment, the ligatinn product (about 480 Bp) is isolated by gel electrophoresis.

The fragment obtained in this way can subsequently be Ligated, via a blunt-end Ligation, into the pBR 322 partplasmid shown in Figure 3 A (Figure 3 The hybrid plasmid pWI 6 is obtained.

Following transformation into the E. coli strain HB 101 ano selection on ampicillin plates, the plasmid DNA of individual clones was tested for the integration of a 480 Bp fragment having the regulator region and the Bal 31-shortened proinsulin gene. To verify the correct shortening of the proinsulin gene by Bat 31 (Figure 3 the plasmids having the integrated proinsulin fragment were sequenced starting from the Eco RI cleavage site.

Three of 60 sequenced clones had the desired shortening of two nucleotides (Figure 3 D).

The hybrid plasmid pWI6 is now used as starting material for integrating the B-galactosidase gene fragments depicted in Figures 1A and B. This reaction is presented here, by way of example, by cloning of the shortened 15 B-galactosidase gene which is detailed in Figure 1A under 2.

and in which, in addition, all the codons for methionine and cysteine have been replaced by codons for other amino acids: equimotar amounts of the shortened and modified B-galactosidase gene sequence (composed of about 120 Bp of the aminoterminal region and 1220 Bp of the carboxyl-terminal region, flanked by 2 Eco RI cleavage sites) and of the hybrid plasmid pWI 6 cleaved with Eco RI (Figure 3 D) are ligated (Figure Following transformation into indicator bacteria with a B-galactosidase A M15 deletion (The Lactose Operon, Ed. J. Beckwith and D. Zipser, Cold Spring Harbor, 1970) the only colonies of bacteria to react to the indicator dye X-gal (5-bromo-4-chloro-3-indolyl-B-D-galactoo side) by a blue coloration are those which have integrated 15 the B-galactosidase gene fragment in the correct orientaoo tion. Any integration of several shortened 8-galactosidase genes into a plasmid can be detected by standard analysis with restriction enzymes.

The plasmid pWZIP dMdC obtained in this way was induced with IPTG (isopropyl-B-D-thiogalactopyranoside), the inducer for B-galactosidase synthesis, and then tested for its capacity to form a fusion product of B-galactosidase protein and p'oinsulin. The contribution of this product to the total cellular protein is about 15 to 20%. The product is insoluble and can easily be separated from the other cell constituents and proteins by centrifugation.

The other shortened B-galactosidase genes detailed in Figures 1 A and B can be integrated in the same way into the plasmid pWI 6 (Figure 3 D) and, in E. coli, have similar synthetic capacities, with insoluble fusion proteins being obtained.

Depending on the choice of the adaptor a or b, either the proinsulin is cleaved off from the fusion protein with cyanogen bromide, or else, after digestion with trypsin, the insulin derivative B 31 Arg is obtained, from which the arginine is removed by enzymatic cleavage with 16 carboxypeptidase B. The proinsulin or insulin liberated in this way can be purified by standard methods.

Example 4 Plasmids such as pWI 6 which are used as the starting construction for preparing fusion expression plasmids with proinsulin and B-galactosidase gene fragments can advantageously be made utilizable for the expression of other gene products. For this purpose, 10 pg of the plasmid pWI 6 are cut with the restriction enzymes Eco RI and Pvu II (Fig. The opened and shortened plasmid is separated from *he proinsulin gene fragments (324 Bp in length) on a polyacrylamide gel. Following preparative isolation of the plasmid fragment from the polyacrylamide gel, it is possible to ligate into a plasmid which has been opened in this way a chemically synthesized DNA sequence which, apart from the Pst I cleavage sjite, has a multiplicity of i unique restriction enzyme cleavage sites: (Eco RI) SstI SmaI BamHI XbaI Sall PstI 3' AA TTC GAG CTC GCC CGG GGA TCC TCT AGA GTC GAC CTG CAG 5' G CTC GAG CGG GCC CGT AGG AGA TCT CAG CTG GAC GTC HindIII Nrul AvaIII BglII Ncol SphI 3' CCC AAG CTT CGC GAT GCA TCA GAT CTA CCA TGG CAT GCC GGG TTC GAA GCG CTA CGT AGT CTA GAT GGT ACC GTA CGG There is obtained in this way the plasmid pWB-1 (Fig. which, besides a regulator region, also contains a DNA sequence which is suitable, with its multiple unique restriction enzyme cleavage sites, for the cloning of various genes of natural and chemical synthetic origin. In analogy to Example 2, a chemically synthesized adaptor with DNA for a poly(amino acid) sequence and an appropriate cleavage 17 site can be inserted upstream of each gene. Between the regulator region and the multiple cloning site (MCS) there is a unique restriction enzyme cleavage site for Eco RI, into which each shortened B-galactosidase gene with the various mutation modalities can be Ligated, in analogy to Example 3.

2 pg of DNA of the plasmid pWB-1 are cleaved with Eco RI and subsequently dephosphorylated. The shortened and modified B-galactosidase gene fragment which has been obtained from pWZIP dMdC by cleavage with Eco RI is ligated into the vector treated in this way. Transformation of E. coli cells and selection for expression results in the plasmid ,o pWZPWB1 dMdC which contains, downstream of the shortened 15 and modified B-galactosidase gene, the multiple cloning site described above (Fig. Example As an example of the incorporation of a chemically synthesized gene, the preparation of a fusion protein having the amino acid sequence of salmon calcitonin to which a carboxyl-terminal glycine has been attached is described hereinafter.

The chemically synthesized DNA sequence for calcitonin which is shown in Table 3, whose nucleotide sequence is specifically suited to the codon usage of E. coli and which has at the 5' end a Bam HI "protrusion" and, up- 30 stream of the codon for the first amino acid, an adaptor i coding for 15 glycine residues and a methionine codon, and i at the 3' end following the proline codon a glycine codon, Stwo stop codons and an Sph I "protrusion", is cloned into the plasmid pWZPWB1 dMdC which has been opened with the restriction enzymes Bam HI and Sph 1. The plasmid pWZP dMdC catc. is obtained.

This plasmid can be transformed in a known manner into E. coli, and the corresponding fusion protein is expressed j 1 L- 18 there and isolated in analogy to the proinsulin fusion protein. The galactosidase portion is cLeaved off with cyanogen bromide, and the subsequent purification is carried out by known methods.

Table 1 Mutagenic primers for the conversion of codons for methionine into codons for leucine or isoleucine and for the conversion of codons for cysteine into codons for serine: dM1 AG ACC GTT CAG ACA GAA CTG G dM2 AG CGC CAC CAG CCA GTG CAG G dM3 GCA AAA ATC CAG TTC GCT GGT G dM4 C GCC AAT CCA TAT CTG TGA AA C GGT AAT CGC AAT TTG ACC AC dM6 A CGG GGT ATA GAT GTC TGA CA dM7 G GCT GGT TTC AAT CAG TTG CT dM8 C ACC AAT CCC TAT ATG GAA ACC dC1 G ACC GTT CAG AGA GAA CTG GCG dC2 CAG CTC GAT GGA AAA ATC CAG TTC dC3 ATC TGC CGT GGA CTG CAA CAA dC4 CGC CAG CTG GGA GTT CAG GCC GCG CTC AAA AGA GGC GGC AGT dC6 GCG CGT CCC GGA GCG CAG ACC 19 TabLe 2 AAT TCT GCA AGA 3' GA CGT TCT (Asn)Ser Ala Arg (EcoRI)

BEI

TTT

AAA

Phe

TAC

ATG

Tyr

GTG

CAC

Val

AAC

TTG

Asn CTG GTG GAC CAC Leu Vai B stN I

CAG

GTC

Gin

TGC

ACG

Cys

GCA

CGT

Ala

CAC

GTG

His

GGG

ccc Gly

CTG

GAC

Lou

GAG

CTC

Giu TGC GGC ACG CCG Cys Gly Fnu4H

TCC

AGO

Ser

CAC

GTG

His

CGA

OCT

Arg

CTA

GAT

Leu

TAC

ATG

Tyr

GGC

CCG

Gly TTC TTC AAG AAG Phe Phe Mb ol

GTG

CAC

Val

ACA

TGT

Thr

GTG

CAC

Val GAA GCT CTT CGA Glu Ala AluI ccc

GGG

Pro

AAG

TTC

Lys

CTC

GAG

Leu

ACC

TGG

Thr

GGC

CCG

Gly Cl1 CGC CG GCG GCC Arg Arg HpaI I

GAG

CTC

Glu

GAC

CTC

G1iL GAC CCT CA CTG GGA GT Asp Pro G1 Avail DdeI

G

C

n

GTG

CAC

Val GGG CAG CCC GTC Gly Gin GAG CTG CTC GAC Giu Leu Alul GGG GGC CCT GGC GCA CCC CCG GGA CCG CGT Gly Gly Pro Gly Ala HaeI1I BstNIHhaI GGC AGC CTG CAG CCC CCG TCG GAC GTC GGG Gly Ser Leu Gin Pro Fnu4HI PstIFnu4HI

TTG

AAC

Leu GCG CTG CGC GAC Ala Leu Hh a

GAG

CTC

Glu

GGG

CCC

Giy Avail

ATC

TAG

Ile

TCC

AGG

Ser

TGC

ACG

Cys CTG CAG GAC GTC Leu Gin PstI AAG CGC TTC GCG Lys Arg Hha I AlI

GGC

CCG

Gly

ATC

TAG

Ile

GAG

CTC

Giu

GTG

CAC

Val

AAC

TTG

Asn

GAG

CTC

Giu

TAC

ATG

Tyr

CAG

GTC

Gin

TGC

ACG

Cys TGC TGC ACG ACG Cys Cys Fnu4H I ACC ACC TGG TCG Thr Ser

TCC

AGG

Ser

CTC

GAG

Leu

TAC

ATG

Tyr CAG CTG GTC GAC Gin Leu PvUI I

AAC

TTG

Asn TAA TAG TCG ACC ATT ATC AGC TGG Sail TGC AGC CA ACG TCG GTT CGA Pstl (HindlII) B 1, C I and A 1 designate the start of the B, C and A chains of monkey proinsuLin 20 TabLe 3 Triplet No.

Amino acid Nucteotide No.

Coding strand Non-coding strand 0 1 met Cys 1 5 10 5' GA TCC ATG TGC 31 G TAC ACG 2 Ser

TCT

AGA

3 Asn

AAC,

TTG

0 004 0 00 ~0 4 0 00 00 0 0000 0404 0 0000 4004 0 4040 040404 0 0 0 40 0 4 4 .4 4 4~ 04 0 444.

4 I I 4 Leu

CTF

GAC

14 Gin 50

CAG

GTC

24 Arg 80

CGC

GCG

34 Stp

I:

5 6 7 8 Ser Thr Cys Val 25 30 TCG ACT TGC OTT AGC TGA ACG CAA 15 16 17 18 Glu Leu His Lys 55 60 GAA CTT CAT AAA CTT GAA GTA 'TT 25 26 27 28 Thr Asn Thr Giy 85 90 ACT AAT ACC GGC TGA TTA TGG CCG 9 Leu 35

CTT

GAA

19 Leu 65

CTG

GAC

29 Ser 95

'ICT

AGA

10 Giy

GGT

CCA

20 Gin

CAG

GTC

30 Giy

GGT

CCA

11 Lys 40

AAG

TTC

21 Thr 70

ACC

TGG

31 Thr 100

ACC

TGG

12 13 Leu Ser CTT TCT GAA AGA 22 23 Tyr Pro TAT CCG ATA GGC 32 33 Pro Gly 105 CCT GGT GGA CCA 1.0 Stp 115 TAA TAG CAT G 3' 'ITT ATC

Claims

1. A process for the preparation of a genetically codable polypeptide by coupling the structural gene for this polypeptide in the correct reading frame via the gene for P-galactosidase, or a fragment of A-galactosidase, to a regulator region, introducing this gene structure into a bacterium, expressing therein an insoluble fusion protein, isolating it after cell disruption, and cleaving off the desired polypeptide chemically or enzymatically, which comprises replacing codons for methionine and/or arginine and/or cysteine in the gene for B-galacto- sidase, or for the fragment of A-galactosidase, in whole or in part, by codons of other amino acids.

2. The process as claimed in claim 1, wherein the gene structure codes for a B-galactosidase fragment of more than 250 amino acids but significantly less than the total $-galactosidase sequence.

3. The process as claimed in claim 1 or 2, wherein the B-galactosidase fragment is composed of a fusion of an amino-terminal and/or a carboxyl-terminal part- sequence of P-3alacQco dase

4. The process as claimed in cLaim 1, 2 or 3, wherein the B-galactosidase fragment has about 300 to about 800 amino acids. The process as claimed in one or more of the pre- ceding claims, wherein the B-galactosidase fragment has about 320 to about 650 amino acids.

6. The process as claimed in one or more of the pre- ceding claims, wherein the gene for the B-galacto- sidase fragment corresponds to a DNA sequence shown in Figure 1. M S* 22-

7. The process as claimed in one or more of the pre- ceding claims, wherein the structura ee for the genetically codable polypeptide is -4-ed- via an adaptor to the gene for the modified B-galactosidase or the B-galactosidase fragment.

8. The process as claimed in claim 7, wherein the adap- tor codes for a poly(amino acid) sequence.

9. The process as claimed in one or more of the pre- ceding claims, wherein a codon for an amino acid which allows chemical or enzymatic separation of the polypeptide from the B-galactosidase portion is loca- ted immediately upstream of the amino-terminal end of the structural gene. The process as claimed in one or more of the pre- ceding claims, wherein the genetically codable poly- peptide is a derivative of growth hormone releasing factor, an interferon, a proinsulin, secretin, inter- leukin-2, a calcitonin or a hirudin.

11. A gene structure containing a regulator region, a gene for the modified B-galactosidase or a B-galac- tosidase fragment, in which codons for methionine and/or arginine and/or cysteine have been replaced, in whole or in part, by codons of other amino acids, and a structural gene for a genetically codable poly- Speptide.

12. A gene structure as claimed in claim 11, wherein the structural gene for the genetically codable poly- peptide is coupled via an adaptor, which ensures the correct reading frame, to the gene for the modified B-galactosidase or the B-galactosidase fragment.

13. A vector containing a gene structure as claimed in claim 11 or 12, i- 23

14. A bacterium containing a vector as claimed in claim 13. E. coli containing a vector as claimed in claim 13.

16. A fusion protein containing modified B-galactosidase or a B-galactosidase fragment as claimed in one or more of claims 1 to 9, and a eukaryotic genetically codable polypeptide.

17. A fusion protein as claimed in claim 16, wherein the eukaryotic genetically codable polypeptide portion is a derivative of growth hormone releasing factor, an interferon, a proinsulin, secretin, interleukin-2, a calcitonin or a hirudin. DATED this 8th day of April 1988. HOECHST AKTIENGESELLSCHAFT EDWD. WATERS SONS PATENT ATTORNEYS QUEEN STREET MELBOURNE. VIC. 3000.