Current Opinion in Biotechnology Vol. 7, No. 5, October 1996 Quality and authenticity of heterologous proteins synthesized in yeast [Review article] Michael R Eckart, Christopher M Bussineau Current Opinion in Biotechnology 1996, 7:525-530. |
|
The choice of a bioprocess for commercial synthesis and purification of a recombinant protein is determined by a variety of factors, such as the intrinsic biological properties of the protein being expressed and the purpose for which it is intended, and also the economic and marketing targets. Among the objectives in the decision-making process are product yield and process productivity, which directly affect the cost of goods, and the quality and authenticity of the heterologous protein which, for regulated biologicals, undergo rigorous characterization to prove biochemical equivalence. Choice of host strain is generally made well in advance because it may significantly affect downstream recovery steps, the types of methods used for quality testing, and regulatory filing timelines.
In general, yeasts are suitable host organisms for the production of eukaryotic heterologous proteins [1] [2] [3] because they combine well-known techniques for the molecular and genetic manipulation of growth characteristics used in prokaryotes with the capability for complex post-translational modifications. A wide range of yeast species are used, including S. cerevisiae, P. pastoris, Hansenula polymorpha, Kluyveromyces lactis, Schizosaccharomyces pombe and Yarrowia lipolytica.
The yeast expression system has been successfully used for nearly two decades in both basic research and the biotechnology industry for the production and secretion of heterologous proteins of human, animal, plant or viral origin. In many instances, it has been used to produce heterologous proteins that are unavailable from traditional sources, either because they cannot be isolated in sufficient amounts from natural sources, or because they are specially designed mutant proteins that are not found in nature. This has also allowed structural and functional studies of medically and pharmacologically relevant proteins.
This brief review will not attempt to summarize all of the heterologous proteins that have been expressed in different yeast systems, but will focus on the different issues that are important considerations for the generation of authentic recombinant proteins.
The genetic background of both the natural and recombinant host yeast strains may affect the quantity and structure of heterologous proteins produced by the cell. A low rate of random genetic instability is inherent in any biological system. In plasmid-based systems, this may be due to a mutational event, such as a point mutation, deletion, insertion, or genetic recombination, or to plasmid loss. Of the many possible random mutations that occur, the vast majority will result in no demonstrable effect on the heterologous protein produced. However, it should be borne in mind that a single uncorrected mutation that results in an amino acid change may lead to structural alterations in the backbone of the protein or detectable activity loss. Only if a growth advantage is imparted to particular cells as a result of an alteration at the nucleic acid level will the event be amplified so as to affect a significant percentage of the recombinant protein produced [4].
In high-copy number plasmid-based systems, the level of altered protein produced due to a mutation in the heterologous gene remains low because multiple copies of the recombinant gene exist. Thus, assuming that all gene copies are expressed with approximately equal efficiency, the product from one mutated copy would comprise only a very small fraction of the total authentic product. Aside from a change in the DNA sequence in the coding region of the heterologous protein, the genetic background of a host strain can influence the level of transcription, translation efficiency, the secretory pathway, protein quality, plasmid stability and plasmid copy number.
The importance of the genetic background of yeast strains is best illustrated by the use of appropriate protease-deficient strains to improve the quality and yields of a number of heterologous proteins [5]. Mutant strains carrying the genetic markers pep4, prb and prc, which are deficient in the major vacuolar proteases, are often used. To ensure both the quality and consistency of a recombinant DNA-derived protein, fermentation processes that require relatively few generations to obtain high yield have been developed using well-characterized cell banks that have had any major instabilities screened out [4].
In contrast to the modification of the DNA sequence, alterations introduced at the translational or post-translational level will be limited to that molecule and not further amplified. The result can be a heterogeneous mixture of polypeptides. Proteins with differing properties will only be readily detected if present in large amounts. Codon usage can play a key role in regulating gene expression and in the production of large quantities of high-quality heterologous protein. At the translational level, errors can occur due to missense insertions by tRNAs whose anticodons match two out of three codon bases. Mistranslation can also occur if the mRNA from a highly expressed heterologous gene contains rare codons, or if the amino acid distribution were inordinately skewed relative to typical yeast proteins. The latter two possibilities can lead to ribosomal pausing and frameshifting, thus reducing either the amount or the quality of the desired gene product. Thus, optimization of the codons in a cloned gene to fit the preferences of the yeast cell can reduce mistranslation, but although this is relatively simple for small proteins, it becomes more complex the larger the protein.
The authenticity of the mature protein product, which is dependent on specific post-translational modifications for biological activity, solubility, biodistribution, circulatory half-life or stability, may represent a more important consideration than the highest level of expression. This is the case when heterologous genes are expressed in cells that lack the appropriate modification system, or if the production of a heterologous protein exceeds the post-translational capacity of the host cell. Many post-translational modifications of proteins are made correctly in the yeast cell.
Yeast recombinant proteins have been derived from both intracellular and secretion expression systems. Intracellular expression is the system of choice for heterologous proteins that are normally expressed in the cytoplasm, as well as for those secreted proteins that have no or few disulfide bonds. In certain cases, yeast, like higher eukaryotes, perform the post-translational removal of the initiator methionine, by the yeast methionyl aminopeptidase, from cytoplasmically expressed proteins. In instances where the amino-terminal methionine residue is retained, immunogenicity problems in medical applications arise. In addition to the removal of the amino-terminal methionine, yeast also carries out amino-terminal acetylation, carboxyterminal methylation, myristylation and farnesylation. The latter three are important in the membrane targeting of intracellularly expressed proteins. Yeast are also capable of assembling intracellularly expressed oligomeric proteins such as hepatitis B surface antigen [6] [7] [8] and the and subunits of mammalian Na+, K+-ATPase [9] [10]. Recently, Hofman et al. [11] reported the high-level expression and functional assembly in S. cerevisiae of the three human embryonic hemoglobins Gower I (2 2), Gower II (2 2) and Portland (2 2). Each protein is a functional tetramer of the general form a2b2. The different chains were correctly processed at their amino termini and the four heme groups bound to the protein imitated the native protein. The amino-terminal amino acid of the recombinant chain is blocked by an acetyl group, as is the case in the naturally occurring molecule.
The ability of yeast to perform various post-translational modifications and targeting of proteins to specific cell membranes has led to its evaluation as an expression system for membrane proteins [12] [13]. Membrane proteins require complex post-translational modifications and interactions with other proteins, such as molecular chaperones, to attain their final conformation and insertion into membranes. Due to the poor understanding of the factors that are important for membrane protein insertion and folding, there are still only a few examples of highly expressed heterologous membrane proteins in yeast. Often, the initial expression experiments are performed in a host that is homologous to the source of the membrane protein. Plant [14] and fungal membrane proteins [13] are much more readily expressed in S. cerevisiae than are mammalian membrane proteins [9] [15]. However, recently reported studies on yeast expression of human membrane proteins, including the erythroid anion exchanger AE1 [16], the emopamil-binding protein [17], the multiple drug resistance related P-glycoprotein [12], the -adrenergic receptor [13], the receptors for dopamine [13], transferrin (AM Williams, CA Enns, FASEB J Abstr 1995, 9:867), retinoid X [18] and estrogen [19], and the Neurospora crassa plasma membrane H+-ATPase [20], have shown that sufficient amounts of functional protein suitable for biochemical, pharmacological and biophysical analysis can be obtained. Higher levels of expression of the G-coupled human dopamine and mouse 5-HT5A serotonin receptors can be achieved in S. pombe [21] and P. pastoris [22] than in S. cerevisiae. In addition, surface expression in yeast is generating substantial interest in applications such as biocatalysis, whole-cell vaccines, and combinatorial library presentation [23].
The low or undetectable levels of direct cytoplasmic expression of an authentic protein is a relatively common problem. In such instances, the problem can be overcome by fusion of the desired protein to stable proteins such as human superoxide dismutase (hSOD) or human -interferon (-IFN). The hSOD fusion approach has been used to overexpress a number of viral polypeptides for diagnostic purposes [24] [25]. These yeast-expressed proteins have been purified and incorporated into diagnostic kits for the screening of blood for HIV-1 and hepatitis C virus (HCV) antibodies. Often, the fusion strategy results in an insoluble product that must be extracted and subjected to an in vitro cleavage and refolding process; this cleavage and refolding occurs with variable efficiency and may be difficult for certain complex structures resulting in a low yield of a high-quality and authentic product. The difficulty of refolding and generation of recombinant proteins with authentic amino termini may be overcome by fusion to ubiquitin. The ubiquitin expression system takes advantage of the processing of the ubiquitin precursor in vivo by an endogenous yeast hydrolase. The ubiquitin fusion approach has two distinct advantages when making products intracellularly. First, it can significantly enhance the yield and/or stability of proteins that are otherwise unstable in the cytosol, the amino-terminal ubiquitin moiety of the fusion possibly preventing immediate degradation of the fusion partner [26] [27]. Secondly, it can generate a protein with any desired amino terminus, apart from proline, because ubiquitin hydrolase processing is largely independent of the amino terminus of the protein fused to ubiquitin.
The secretion of heterologous proteins into the culture medium offers a way to avoid toxicity from accumulated material and simplify purification of the protein, because the yeast organism secretes only very low levels of native proteins. Furthermore, the passage of the proteins through the secretory pathway permits post-translational events such as proteolytic maturation, glycosylation and disulfide bond formation to occur. The secretion of heterologous proteins is driven by virtue of a cleavable, amino-terminal signal sequence that is derived from either the native protein or the leaders of the S. cerevisiae prepro mating factor or invertase. Numerous examples have appeared in the recent literature that demonstrate the capability of yeast to secrete mature human polypeptides possessing the expected biological, and in most cases chemical, properties: single chain Fv fragments [28] [29], erythropoietin receptor (M Nair, K Harris, Blood Abstr 1995, 86:62), thrombomodulin (CW White, LR White, EA Komives, FASEB J Abstr 1995, 9:1404), factor XII (NM Kempi, EA Komives, FASEB J Abstr 1995, 9:1405), gastric lipase [30], fibroblast collagenase (proMMP1) [31], analogs of tissue factor pathway inhibitor [32], and interleukin-8 [33]. In all cases, specific structure/function relationships such as catalytic activity, ligand binding, ATP-dependent efflux, specific anti-idiotype binding and clot promotion were found to be comparable with the native counterpart. It is not surprising that yeast secrete nonhuman proteins, such as coffee bean -galactosidase [34], bovine enterokinase [35], and Schistosoma mansoni cathepsin B [36], that are equivalent to the native molecule. Cathepsin B is a good example of a protein that could not be expressed successfully in bacterial or insect cell host systems, but was secreted in a functional, authentic form by yeast.
Often, the removal of the -factor leader sequences by the Kex2 protease (Kex2p) is incomplete, resulting in hyperglycosylated secreted material with amino-terminal extensions. Kjeldsen et al. [37] have recently shown that the introduction of an amino-terminal spacer between the -factor leader and the insulin precursor significantly improved Kex2p processing. The spacer peptide is then removed either in vitro or in vivo by a specific protease or yeast aspartyl protease 3, respectively. This modification also increased fermenter yield of the insulin precursor 215% [37].
The secretion of properly folded proteins, so crucial for correct biological activity, is one of the major factors taken into account when considering yeast as the preferred host for heterologous protein expression. This is due to the fact that the direct capture of active product from conditioned medium eliminates the need for costly and low-yielding cell disruption or refold process steps. Although there are some exceptions, most notably human serum albumin, which is secreted at 4 gl-1 by P. pastoris [3], the high productivity expected from the design of high-copy number vectors containing the foreign gene driven by very strong promotors in yeast is not usually obtained with secreted proteins. To better understand the mechanistic nature of this problem, Wittrup and colleagues [38] have been studying bovine pancreatic trypsin inhibitor (BPTI) as a model of secretion and the role of folding in the endoplasmic reticulum (ER). In this system, it was found that the level of expression driven by a multicopy (>50), 2΅-based plasmid vector was equivalent to that driven by a single copy construct. There was higher gene transcription in the multicopy system, as expected. However, the binding of conformationally specific antibodies showed that the majority of protein in cells harboring the multicopy construct were improperly folded and retained in the lumen of the ER [38]. This suggests that the limitation for expression through the eukaryotic secretion pathway may be the maintenance of correct intralumenal tertiary structure.
High expression of granulocyte-colony stimulating factor (G-CSF), erythropoietin (EPO) or S. pombe acid phosphatase in S. cerevisiae leads to low levels of extractable heavy chain binding protein (BiP) and protein disulfide isomerase (PDI), which are key proteins involved in lumenal folding. By showing that the BiP synthesis rate was not reduced as a result of G-CSF expression and that the augmented level of BiP could not be restored by its coexpression, it was generally suggested that foldase losses could be due to degradation or aggregation (possibly as a result of their own misfolding) [39]. Unfortunately, although studies such as these have increased our peripheral understanding of secretion, the exact mechanism by which lumenal proteins play a role in folding and secretion has yet to be elucidated and employed to obtain the locked productivity potential of yeast.
Glycosylation is both the most common and the most complex form of post-translational modification [40]. The majority of therapeutically relevant proteins appear glycosylated in nature and therefore must be glycosylated in order to display the correct biological activity. Thus the monitoring of glycosylation patterns in the quality control of recombinant therapeutic proteins to assure product safety, efficacy and consistency has been growing in importance. It is postulated that the general function of protein glycosylation is to aid in the folding of the nascent polypeptide chain and in the stabilization of the conformation of the mature glycoprotein [41]. The polypeptide, the host-cell phenotype and the environment in which the cell is maintained can all determine the glycosylation pattern, and hence the quality of the resulting glycoprotein.
Although a number of recombinant glycoproteins with pharmaceutical or industrial value have been obtained using the yeast expression system, they often have altered biological properties and functions compared to their native counterparts. This is mostly due to differences in protein glycosylation. Alteration in the normal glycosylation patterns of therapeutic proteins may affect their in vivo function with respect to solubility, sensitivity to proteases, serum half-life, or biological activities, such as targeting to specific cells or interaction with specific receptors. Yeast cells recognize the same type of N-glycosylation recognition sequence as higher eukaryotic cells, indicating that, potentially at least, they glycosylate at the same sites as higher eukaryotic cells. However, the glycosyl groups on yeast glycoproteins consist primarily of mannose residues appended in different linkages to the core glycosyl units. Because the recombinant glycoproteins generated in S. cerevisiae are of the high-mannose type, they will be recognized by mannose receptors on various cells and removed when injected into the circulation of mammalian species. In addition, nonhuman glycosylation patterns are potentially immunoreactive. The hyperglycosylation very often negates any advantages that the microbial eukaryote S. cerevisiae might have over E. coli or mammalian cells. Mannan mutants (mnn) that exhibit less elaborate N-linked glycosylation have been isolated. These mutants, however, do not grow as well as other yeast strains. Other yeast species such as P. pastoris and H. polymorpha seem less prone to hyperglycosylating heterologous proteins [2] [42]. The average mannose chain length of proteins produced in the latter two yeast is only 814 monomers, compared to 50100 units in S. cerevisiae.
Therefore, in many cases, mammalian cells are the preferred host cells for the generation of recombinant glycoprotein therapeutics. Recently, however, it has been suggested by Sadhukhan and Sen [43] that all the steps involved in eukaryotic protein trafficking and post-translational modification may not be equivalent in yeast and higher eukaryotes. This stems from work performed on the testicular isozyme of angiotensin-converting enzyme (ACET), where the contributions of each of five potential N-glycosylation sites of ACET toward its synthesis, glycosylation, intracellular transport, cleavage secretion and enzymatic activity were studied. One particular mutant was unglycosylated, enzymatically inactive and rapidly degraded in HeLa cells, whereas the same mutant was synthesized, N-glycosylated and properly transported in P. pastoris. The O-linked glycans of S. cerevisiae also differ from those of higher eukaryotes. Stratton-Thomas et al. [44] reported that the human urokinase plasminogen activator (uPA) epidermal growth factor-like domain is post-translationally modified by an unusual O-linked fucosylation in both naturally isolated uPA and recombinant uPA produced in mammalian culture, but not in S. cerevisiae.
The ability to genetically engineer yeast cells and the increasing available knowledge of the host machinery necessary to form the complex carbohydrate structures found on many mammalian proteins provides the opportunity to specifically optimize the host-cell background that will produce proteins of pharmaceutical interest with the desired carbohydrate structures. However, because it is not possible to make meaningful generalizations about the optimal glycosylation pattern of recombinant therapeutics, each glycoprotein must be individually assessed in the context of both the system in which it is expressed and the desired clinical benefit.
A certain degree of heterogeneity within a recombinant glycoprotein is permitted by regulatory agencies provided that efficacy, safety and consistency can be demonstrated. Glycosylation consistency can also be monitored and comparative studies of structure and function are becoming easier with the development of automated and sensitive new techniques [45] [46] [47].
With the growing number of proteins being generated by bioengineering, it is becoming increasingly important to understand the roles different expression systems play in the synthesis process. Insights into the mechanisms of protein modification will ultimately lead to increased yields of authentic protein products. Modern biotechnology has presented a proven and very viable means by which proteins may be successfully commercialized, as exemplified by Novo-Nordisk's process for the production of recombinant human insulin for the European market, their newly launched process for factor VIIa, and Immunex's G-CSF. In addition, industrial production of the food enzyme chymosin relies on the yeast Kluveromyces lactis. A number of other examples of mammalian and nonmammalian proteins can be found that are equally important from a clinical or fundamental basis but that are currently under development.
It is not yet understood in full detail which molecular processes are required in the host organism to synthesize proteins with the stringent quality attributes of parenteral drugs. Although all the information required for the generation of a polypeptide into the correct three-dimensional structure is contained within the primary amino acid sequence, our understanding of the role of secondary protein modifications in the regulation of activity, and the fundamental relationship between protein structure and function, is still in its infancy.