Elsevier

Journal of Biotechnology

Volume 151, Issue 2, 20 January 2011, Pages 159-165
Journal of Biotechnology

Predicting cell-specific productivity from CHO gene expression

https://doi.org/10.1016/j.jbiotec.2010.11.016Get rights and content

Abstract

Improving the rate of recombinant protein production in Chinese hamster ovary (CHO) cells is an important consideration in controlling the cost of biopharmaceuticals. We present the first predictive model of productivity in CHO bioprocess culture based on gene expression profiles. The dataset used to construct the model consisted of transcriptomic data from 70 stationary phase, temperature-shifted CHO production cell line samples, for which the cell-specific productivity had been determined. These samples were utilised to investigate gene expression over a range of high to low monoclonal antibody and fc-fusion-producing CHO cell lines. We utilised a supervised regression algorithm, partial least squares (PLS) incorporating jackknife gene selection, to produce a model of cell-specific productivity (Qp) capable of predicting Qp to within 4.44 pg/cell/day root mean squared error in cross model validation (RMSECMV). The final model, consisting of 287 genes, was capable of accurately predicting Qp in a further panel of 10 additional samples which were incorporated as an independent validation. Several of the genes constituting the model are linked with biological processes relevant to protein metabolism.

Introduction

Cell and process engineering approaches to improve productivity in bioreactors have largely focussed on reactor design and culture strategies such as clonal selection, stability, medium formulation, culture temperature and cell engineering for controlled proliferation and increased resistance to apoptosis (Altamirano et al., 2000, Butler, 2005, Prentice et al., 2007, Wurm, 2004). Using this approach, key cell line characteristics, including cell growth rate, achievable cell densities and correct product processing are identified only following a lengthy labour-intensive screening process. To complement these strategies, previous attempts have been made to modify or improve the performance of these lines in the bioreactor using cellular engineering strategies (reviewed in Mohan et al., 2008). However, these studies have demonstrated only incremental improvements in productivity and the cellular processes underpinning Qp remains poorly understood in Chinese hamster ovary (CHO) and other bioprocess-relevant cell lines.

The development of expression profiling methodologies such as microarrays and proteomics offer the prospect of examining the molecular phenotypes underlying productivity in CHO and their application in bioprocess research has already been extensively reviewed (Griffin et al., 2007). Previous microarray expression profiling studies focussing on productivity in CHO (Doolan et al., 2008, Schaub et al., 2010, Trummer et al., 2008, Kantardjieff et al., 2010, Yee et al., 2007) and in the commercially used mouse myeloma NS0 cell line (Charaniya et al., 2009, Khoo et al., 2007, Seth et al., 2007) have identified several crucial pathways and processes. These microarray-based productivity studies have also been complemented by proteomics studies in CHO (Carlage et al., 2009, Meleady et al., 2008, Nissom et al., 2006) and NS0 (Seth et al., 2007, Smales et al., 2004, Alete et al., 2005, Dinnis et al., 2006).

To date, profiling studies in CHO have been characterised by relatively small numbers of samples (typically < 20) compared in a case/control format. Interesting genes and protein candidates are generally prioritised via the traditional paradigm of differential expression (i.e. fold change). A significant drawback of this approach includes the selection of an appropriate threshold (considering the inherent noisy nature of microarrays) resulting in too few or too many genes identified and providing inconsistent comparison with studies on similar biological systems. This limitation is further compounded by the observation that changes in productivity levels are usually accompanied by only modest changes in gene expression levels (Smales et al., 2004, Yee et al., 2009). Larger sample numbers in combination with more sophisticated algorithms can therefore make a significant contribution to identifying the molecular mechanisms underpinning productivity in CHO.

Multivariate statistics and machine learning algorithms for classification and regression allow relationships between genes to be considered and have previously been advocated over univariate gene selection methods (Boulesteix and Strimmer, 2007). Partial least squares (PLS) is a statistical modelling technique closely related to principal component analysis (PCA) and is used to construct predictive models for complex multidimensional datasets. PLS components, known as latent variables (LVs), are derived from linear combinations of the original variables to maximise the covariance between a matrix of independent variables (e.g. gene expression) and dependent variable(s) (e.g. productivity). By retaining only those LVs containing the majority of information on the relationship between predictor and response variables (thus removing a substantial amount of noise and measurement error) a model can then be formed between these LVs and cell-specific productivity. Detailed treatments of the PLS algorithm have been previously described (Martens and Naes, 1989).

Previous examples of PLS predictive model generation from microarrays include regression (Gidskehaug et al., 2007, Huang et al., 2004, Misra et al., 2007), the development of models for classification (Aaroe et al., 2010, Nguyen and Rocke, 2002a) and proportional hazard models for survival analysis (Nguyen and Rocke, 2002b). Apart from microarrays, the technique is utilised across a variety of fields and has previously been applied to various aspects of bioprocessing including mass spectrometry-based proteomic profiling, process monitoring and process analytical technology (PAT) (Sellick et al., 2010, Stansfield et al., 2007, Thomassen et al., 2010).

In this paper, we construct a regression model using the PLS algorithm to capture the relationship between gene expression and a quantitative phenotypic variable (cell-specific productivity). We aim to produce a model for prediction of Qp from gene expression measurements with a potential application in bioprocess development. The use of a gene selection routine coupled with rigorous statistical validation was incorporated to reduce PLS model complexity and decrease the error rate. The algorithm may also provide a vehicle for the identification of subsets of genes relevant to the biology underlying productivity of recombinant proteins in CHO. This work represents one of the largest studies of CHO transcriptomic datasets published to date.

Section snippets

Determination of cell-specific productivity

The concentration of recombinant protein product in conditioned media samples (volumetric titre) was determined by Protein-A HPLC. Cell viability was determined using the trypan blue dye-exclusion viability assay and hemocytometer counting (for shake flask samples) or a Cedex Automated Cell Culture Analyzer (Roche Innovatis) (for bioreactor samples). Cell specific productivity was determined as shown below.Qp(pg/cell/day)=titre2titre1(density2density1)×dailygrowthratewhere

Production CHO cell line sample dataset

Fig. 2 illustrates the range of Qp titre measurements (pg protein/cell/day) for each of the 70 CHO production cell line samples for which gene expression measurements were obtained using the WyeHamster2a microarray and form the calibration data to build the PLS model. As can be seen, there is a relatively uniform spread across the range of Qp values up to a per-cell titre of 30 pg protein/cell/day, whereafter seven remaining samples constituted the range of Qp values from ∼30 to 50 pg

Conclusions

In conclusion, we describe a substantial (70-sample calibration set, 10-sample testing set) production cell line transcriptomic study which has been used to develop the first predictive model for specific productivity in CHO to within 4.44 Qp units (pg/cell/day). A multivariate regression algorithm has been applied to the data in order to construct a proof of concept model for the prediction of Qp based on 287 (212 annotated) key genes retained during an intensive selection procedure. The study

Acknowledgements

This work was supported by funding from Science Foundation Ireland (SFI) grant number 07/IN.1/B1323.

References (50)

  • C. Ambroise et al.

    Selection bias in gene extraction on the basis of gene-expression data

    Proceedings of the National Academy of Sciences of the United States of America

    (2002)
  • N. Blow

    Transcriptomics: the digital generation

    Nature

    (2009)
  • B.M. Bolstad et al.

    A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

    Bioinformatics

    (2003)
  • A.L. Boulesteix et al.

    Partial least squares: a versatile tool for the analysis of high-dimensional genomic data

    Briefings in Bioinformatics

    (2007)
  • M. Butler

    Animal cell cultures: recent achievements and perspectives in the production of biopharmaceuticals

    Applied Microbiology and Biotechnology

    (2005)
  • T. Carlage et al.

    Proteomic profiling of a high-producing Chinese hamster ovary cell culture

    Analytical Chemistry

    (2009)
  • S. Charaniya et al.

    Mining transcriptome data for function-trait relationship of hyper productivity of recombinant antibody

    Biotechnology and Bioengineering

    (2009)
  • J.Y. Chung et al.

    Effect of doxycycline-regulated calnexin and calreticulin expression on specific thrombopoietin productivity of recombinant Chinese hamster ovary cells

    Biotechnology and Bioengineering

    (2004)
  • D.M. Dinnis et al.

    Functional proteomic analysis of GS-NS0 murine myeloma cell lines with varying recombinant monoclonal antibody production rate

    Biotechnology and Bioengineering

    (2006)
  • P. Doolan et al.

    Transcriptional profiling of gene expression changes in a PACE-transfected CHO DUKX cell line secreting high levels of rhBMP-2

    Molecular Biotechnology

    (2008)
  • M.R. Downham et al.

    Endoplasmic reticulum protein expression in recombinant NS0 myelomas grown in batch culture

    Biotechnology and Bioengineering

    (1996)
  • B. Efron et al.

    The jackknife estimate of variance

    Annals of Statistics

    (1981)
  • P. Filmoser et al.

    Repeated double cross validation

    Journal of Chemometrics

    (2009)
  • L. Gidskehaug et al.

    A framework for significance analysis of gene expression data using dimension reduction methods

    BMC Bioinformatics

    (2007)
  • N.V. Hayes et al.

    Protein disulfide isomerase does not control recombinant IgG4 productivity in mammalian cell lines

    Biotechnology and Bioengineering

    (2010)
  • Cited by (72)

    • Rational design of transient gene expression process with lipoplexes for high-level therapeutic protein production in HEK293 cells

      2019, Process Biochemistry
      Citation Excerpt :

      HDAC inhibitors were tested as transcription enhancers in this work, added at 17 h post-transfection in accordance with the Expi293 expression system. However, DMSO and LiAc had to be added at 3 h prior to transfection [13], and the suitable addition time was less defined for other enhancers like nocodazole and trichostatin A, and adverse effects have also been reported for these chemicals [28,29]. Therefore, the time of addition for each HDAC inhibitor should be further controlled according to the expression conditions.

    View all citing articles on Scopus
    1

    Both authors contributed equally to this publication.

    View full text