Predicting cell-specific productivity from CHO gene expression
Introduction
Cell and process engineering approaches to improve productivity in bioreactors have largely focussed on reactor design and culture strategies such as clonal selection, stability, medium formulation, culture temperature and cell engineering for controlled proliferation and increased resistance to apoptosis (Altamirano et al., 2000, Butler, 2005, Prentice et al., 2007, Wurm, 2004). Using this approach, key cell line characteristics, including cell growth rate, achievable cell densities and correct product processing are identified only following a lengthy labour-intensive screening process. To complement these strategies, previous attempts have been made to modify or improve the performance of these lines in the bioreactor using cellular engineering strategies (reviewed in Mohan et al., 2008). However, these studies have demonstrated only incremental improvements in productivity and the cellular processes underpinning Qp remains poorly understood in Chinese hamster ovary (CHO) and other bioprocess-relevant cell lines.
The development of expression profiling methodologies such as microarrays and proteomics offer the prospect of examining the molecular phenotypes underlying productivity in CHO and their application in bioprocess research has already been extensively reviewed (Griffin et al., 2007). Previous microarray expression profiling studies focussing on productivity in CHO (Doolan et al., 2008, Schaub et al., 2010, Trummer et al., 2008, Kantardjieff et al., 2010, Yee et al., 2007) and in the commercially used mouse myeloma NS0 cell line (Charaniya et al., 2009, Khoo et al., 2007, Seth et al., 2007) have identified several crucial pathways and processes. These microarray-based productivity studies have also been complemented by proteomics studies in CHO (Carlage et al., 2009, Meleady et al., 2008, Nissom et al., 2006) and NS0 (Seth et al., 2007, Smales et al., 2004, Alete et al., 2005, Dinnis et al., 2006).
To date, profiling studies in CHO have been characterised by relatively small numbers of samples (typically < 20) compared in a case/control format. Interesting genes and protein candidates are generally prioritised via the traditional paradigm of differential expression (i.e. fold change). A significant drawback of this approach includes the selection of an appropriate threshold (considering the inherent noisy nature of microarrays) resulting in too few or too many genes identified and providing inconsistent comparison with studies on similar biological systems. This limitation is further compounded by the observation that changes in productivity levels are usually accompanied by only modest changes in gene expression levels (Smales et al., 2004, Yee et al., 2009). Larger sample numbers in combination with more sophisticated algorithms can therefore make a significant contribution to identifying the molecular mechanisms underpinning productivity in CHO.
Multivariate statistics and machine learning algorithms for classification and regression allow relationships between genes to be considered and have previously been advocated over univariate gene selection methods (Boulesteix and Strimmer, 2007). Partial least squares (PLS) is a statistical modelling technique closely related to principal component analysis (PCA) and is used to construct predictive models for complex multidimensional datasets. PLS components, known as latent variables (LVs), are derived from linear combinations of the original variables to maximise the covariance between a matrix of independent variables (e.g. gene expression) and dependent variable(s) (e.g. productivity). By retaining only those LVs containing the majority of information on the relationship between predictor and response variables (thus removing a substantial amount of noise and measurement error) a model can then be formed between these LVs and cell-specific productivity. Detailed treatments of the PLS algorithm have been previously described (Martens and Naes, 1989).
Previous examples of PLS predictive model generation from microarrays include regression (Gidskehaug et al., 2007, Huang et al., 2004, Misra et al., 2007), the development of models for classification (Aaroe et al., 2010, Nguyen and Rocke, 2002a) and proportional hazard models for survival analysis (Nguyen and Rocke, 2002b). Apart from microarrays, the technique is utilised across a variety of fields and has previously been applied to various aspects of bioprocessing including mass spectrometry-based proteomic profiling, process monitoring and process analytical technology (PAT) (Sellick et al., 2010, Stansfield et al., 2007, Thomassen et al., 2010).
In this paper, we construct a regression model using the PLS algorithm to capture the relationship between gene expression and a quantitative phenotypic variable (cell-specific productivity). We aim to produce a model for prediction of Qp from gene expression measurements with a potential application in bioprocess development. The use of a gene selection routine coupled with rigorous statistical validation was incorporated to reduce PLS model complexity and decrease the error rate. The algorithm may also provide a vehicle for the identification of subsets of genes relevant to the biology underlying productivity of recombinant proteins in CHO. This work represents one of the largest studies of CHO transcriptomic datasets published to date.
Section snippets
Determination of cell-specific productivity
The concentration of recombinant protein product in conditioned media samples (volumetric titre) was determined by Protein-A HPLC. Cell viability was determined using the trypan blue dye-exclusion viability assay and hemocytometer counting (for shake flask samples) or a Cedex Automated Cell Culture Analyzer (Roche Innovatis) (for bioreactor samples). Cell specific productivity was determined as shown below.where
Production CHO cell line sample dataset
Fig. 2 illustrates the range of Qp titre measurements (pg protein/cell/day) for each of the 70 CHO production cell line samples for which gene expression measurements were obtained using the WyeHamster2a microarray and form the calibration data to build the PLS model. As can be seen, there is a relatively uniform spread across the range of Qp values up to a per-cell titre of 30 pg protein/cell/day, whereafter seven remaining samples constituted the range of Qp values from ∼30 to 50 pg
Conclusions
In conclusion, we describe a substantial (70-sample calibration set, 10-sample testing set) production cell line transcriptomic study which has been used to develop the first predictive model for specific productivity in CHO to within 4.44 Qp units (pg/cell/day). A multivariate regression algorithm has been applied to the data in order to construct a proof of concept model for the prediction of Qp based on 287 (212 annotated) key genes retained during an intensive selection procedure. The study
Acknowledgements
This work was supported by funding from Science Foundation Ireland (SFI) grant number 07/IN.1/B1323.
References (50)
- et al.
Reducing over-optimism in variable selection by cross-model validation
Chemometrics and Intelligent Laboratory Systems
(2006) - et al.
Cross model validation and optimisation of bilinear regression models
Chemometrics and Intelligent Laboratory Systems
(2008) - et al.
Advancing mammalian cell culture engineering using genome-scale technologies
Trends in Biotechnology
(2007) - et al.
Transcriptome and proteome analysis of Chinese hamster ovary cells under low temperature and butyrate treatment
Journal of Biotechnology
(2010) - et al.
Overexpression of heat shock proteins (HSPs) in CHO cells for extended culture viability and improved recombinant protein production
Journal of Biotechnology
(2009) - et al.
Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR)
Food Quality and Preference
(2000) - et al.
Calnexin overexpression sensitizes recombinant CHO cells to apoptosis induced by sodium butyrate treatment
Cell Stress & Chaperones
(2009) - et al.
Gene expression profiling of peripheral blood cells for early detection of breast cancer
Breast Cancer Research
(2010) - et al.
Proteomic analysis of enriched microsomal fractions from GS-NS0 murine myeloma cells with varying secreted recombinant monoclonal antibody productivities
Proteomics
(2005) - et al.
Improvement of CHO cell culture medium formulation: simultaneous substitution of glucose and glutamine
Biotechnology Progress
(2000)
Selection bias in gene extraction on the basis of gene-expression data
Proceedings of the National Academy of Sciences of the United States of America
Transcriptomics: the digital generation
Nature
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
Bioinformatics
Partial least squares: a versatile tool for the analysis of high-dimensional genomic data
Briefings in Bioinformatics
Animal cell cultures: recent achievements and perspectives in the production of biopharmaceuticals
Applied Microbiology and Biotechnology
Proteomic profiling of a high-producing Chinese hamster ovary cell culture
Analytical Chemistry
Mining transcriptome data for function-trait relationship of hyper productivity of recombinant antibody
Biotechnology and Bioengineering
Effect of doxycycline-regulated calnexin and calreticulin expression on specific thrombopoietin productivity of recombinant Chinese hamster ovary cells
Biotechnology and Bioengineering
Functional proteomic analysis of GS-NS0 murine myeloma cell lines with varying recombinant monoclonal antibody production rate
Biotechnology and Bioengineering
Transcriptional profiling of gene expression changes in a PACE-transfected CHO DUKX cell line secreting high levels of rhBMP-2
Molecular Biotechnology
Endoplasmic reticulum protein expression in recombinant NS0 myelomas grown in batch culture
Biotechnology and Bioengineering
The jackknife estimate of variance
Annals of Statistics
Repeated double cross validation
Journal of Chemometrics
A framework for significance analysis of gene expression data using dimension reduction methods
BMC Bioinformatics
Protein disulfide isomerase does not control recombinant IgG4 productivity in mammalian cell lines
Biotechnology and Bioengineering
Cited by (72)
A scoping review of supervised learning modelling and data-driven optimisation in monoclonal antibody process development
2023, Digital Chemical EngineeringZinc supplementation modulates intracellular metal uptake and oxidative stress defense mechanisms in CHO cell cultures
2021, Biochemical Engineering JournalGlobal phosphoproteomic study of high/low specific productivity industrially relevant mAb producing recombinant CHO cell lines
2021, Current Research in BiotechnologyRational design of transient gene expression process with lipoplexes for high-level therapeutic protein production in HEK293 cells
2019, Process BiochemistryCitation Excerpt :HDAC inhibitors were tested as transcription enhancers in this work, added at 17 h post-transfection in accordance with the Expi293 expression system. However, DMSO and LiAc had to be added at 3 h prior to transfection [13], and the suitable addition time was less defined for other enhancers like nocodazole and trichostatin A, and adverse effects have also been reported for these chemicals [28,29]. Therefore, the time of addition for each HDAC inhibitor should be further controlled according to the expression conditions.
- 1
Both authors contributed equally to this publication.