Novel Strategy for Protein Exploration: High-throughput Screening Assisted with Fuzzy Neural Network

https://doi.org/10.1016/j.jmb.2005.05.026Get rights and content

To engineer proteins with desirable characteristics from a naturally occurring protein, high-throughput screening (HTS) combined with directed evolutional approach is the essential technology. However, most HTS techniques are simple positive screenings. The information obtained from the positive candidates is used only as results but rarely as clues for understanding the structural rules, which may explain the protein activity.

In here, we have attempted to establish a novel strategy for exploring functional proteins associated with computational analysis. As a model case, we explored lipases with inverted enantioselectivity for a substrate p-nitrophenyl 3-phenylbutyrate from the wild-type lipase of Burkhorderia cepacia KWI-56, which is originally selective for (S)-configuration of the substrate. Data from our previous work on (R)-enantioselective lipase screening were applied to fuzzy neural network (FNN), bioinformatic algorithm, to extract guidelines for screening and engineering processes to be followed. FNN has an advantageous feature of extracting hidden rules that lie between sequences of variants and their enzyme activity to gain high prediction accuracy.

Without any prior knowledge, FNN predicted a rule indicating that “size at position L167,” among four positions (L17, F119, L167, and L266) in the substrate binding core region, is the most influential factor for obtaining lipase with inverted (R)-enantioselectivity. Based on the guidelines obtained, newly engineered novel variants, which were not found in the actual screening, were experimentally proven to gain high (R)-enantioselectivity by engineering the size at position L167. We also designed and assayed two novel variants, namely FIGV (L17F, F119I, L167G, and L266V) and FFGI (L17F, L167G, and L266I), which were compatible with the guideline obtained from FNN analysis, and confirmed that these designed lipases could acquire high inverted enantioselectivity. The results have shown that with the aid of bioinformatic analysis, high-throughput screening can expand its potential for exploring vast combinatorial sequence spaces of proteins.

Introduction

To create proteins with desirable properties from a natural protein by mutation and selection, exhaustive screening processes including library construction and assay of numerous samples of the library are required.1 The size of the library, which determines the labor of screening, expands enormously in order to explore such a massive combination space of protein sequences. For example, to examine the mutational effect on a protein based on only five residues, 3,200,000 (=205) variants should be theoretically covered. However, in such cases, a conventional screening strategy that experimentally creates every mutation is not feasible, due to limited throughput of assay devices and incompleteness of the selection that is inherited in the high-throughput technology.

Here, we propose a novel screening strategy introducing bioinformatic analysis to assist, revise, and integrate the high-throughput screening (HTS) process for the efficient exploration of combinatorial sequence space. As a model study, this strategy was applied to explore a novel enzyme with inverted enantioselectivity.

The importance of enantiomerically pure compounds has been widely expanding in pharmaceutical, agricultural, and synthetic organic chemistry fields.2, 3, 4 Enzymes, which are the biological enantioselective proteins, have been the key tool for the effective synthesis of these compounds.

In genetic engineering of enzymes, evolutional methods, which include random mutagenesis combined with HTS, have proved their effectiveness in tuning the enantioselectivity of enzymes.4, 5, 6, 7 Error-prone polymerse chain reaction (PCR) followed by saturation mutagenesis has succeeded in effectively inverting the enantioselectivity of hydantoinase toward d,l-5-(2-methylthioethyl) hydantoin.8 Reetz et al. have reported the effectiveness of combining error-prone PCR with DNA shuffling for obtaining a lipase variant of Pseudomonas aeruginosa with complete enantioselectivity inversion.9, 10 The same strategy has been successfully applied to other enzymes.11 These methods have successfully expanded the probability of having a wider variation in samples obtained from a limited source. However, the requirement of a great deal of labor in the screening process still remained, implying several cycles of screening thousands of variants with every round of random mutation, saturation mutation, or shuffling.

One of the directional approaches in library screening was the use of focused mutation on a rationally determined position in the enzyme. Recently, such a strategy has been shown to be effective in obtaining a bacterial lipase mutant with inverted enantioselectivity.12 In this study, the mutation sites were rationally determined based on a three-dimensional (3D) structural model of the intermediate complex between an enantioactive substrate, which had been already used as a model substrate for a directed evolutional experiment of an esterase13 and the enzyme, and mutational variants were obtained using a novel high-throughput technology, namely SIMPLEX (single-molecule-PCR-linked in vitro expression).14, 15, 16

Here, we attempted to construct a new strategy to effectively screen lipases with inverted enantioselectivity, from the (S)-form substrate to the (R)-form substrate, using fuzzy neural network (FNN). The basic idea of our strategy is to apply the data from HTS into FNN, and the result of the analysis is utilized as feedback to design a more effective experiment for obtaining the objective enzyme.

FNN is a type of artificial neural network, which automatically constructs complex model structures by learning the hidden relationship between input and output data, and it functions as a predictor.17 As compared to the artificial neural networks, FNN has an advantageous feature, i.e. the “fuzzy layer.” This enables the interpretation of the model structure and extraction of the quantified relationship between input and output values as “a rule” designated as “fuzzy rule.16” Such a feature should be regarded as a significant character for a predicting program in protein design, where usual artificial neural networks only function as a black box tool. We have utilized this feature of FNN by applying to a wide range of research fields to predict significant factors and factor combinations in complex phenomena, such as industry manufacturing,18, 19, 20 coffee taste modeling,21 peptide prediction,22 and gene profiles from microarray data.23, 24

To our knowledge, this work is the first trial in which high-throughput technology and computational technology have been actually combined for effective enzyme engineering in an interactive manner. Our objective was, with less labor input and without prior knowledge, to use data from the first screening as clues for the following protein engineering and in turn explore novel enzymes that were missed out in the HTS. A scheme with experimental results and future prospects has been discussed.

Section snippets

Data acquisition of enantioselective lipase variants and experimental scheme

As a model study, an integrated protein exploration strategy combining high-throughput technology (in this case, SIMPLEX) and FNN, a program that automatically extracts rules to interpret complex phenomena, was used for obtaining new enzymes with inversed enantioselectivity. The research scheme is described briefly in Figure 1. In normal HTS, only a few winners selected give us meaningful information (Figure 1(a)–(d)) and others are discarded. On the other hand, in our proposed strategy

Discussion

The objective of the present study was to establish a novel strategy for exploring proteins with a selective activity, by combining HTS technology with bioinformatic analysis (Figure 1). Here, as a model case, we attempted to explore lipases with the objective enantioselectivity. It is important to note that in most bioinformatic studies, the prediction results are only validated by in silico data and rarely validated by following up with actual experiments. However, in our study, we designed

Screening of inverted enantioselective lipase from variant library

The variant library of the Burkholderia cepacia KWI-56 lipase was constructed with single-molecule PCR and cell-free protein synthesis, termed SIMPLEX (single-molecule-PCR-linked in vitro expression) method. The lipase variants were screened with their enantioselectivity inverted from (S)-form substrate to (R)-form substrate. To assay the lipase activity, 10 μl of cell-free reaction solution from the variant library was added to 90 μl of lipase assay solution (2 mM of (R)-or (S)-p-nitrophenyl

Acknowledgements

This study was partly supported by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (No. 16360411, 15360439 and 17206082). We also acknowledge Hori Informational Science Foundation for financial support.

References (28)

  • J. Kyte et al.

    A simple method for displaying the hydropathic character of a protein

    J. Mol. Biol.

    (1982)
  • J.M. Zimmerman et al.

    The characterization of amino acid sequences in proteins by statistical methods

    J. Theor. Biol.

    (1968)
  • R. Patel et al.

    Enzymatic synthesis of chiral intermediates for pharmaceuticals

    J. Ind. Microbiol. Biotechnol.

    (2003)
  • D.D. Ryu et al.

    Recent progress in biomolecular engineering

    Biotechnol. Prog.

    (2000)
  • Cited by (21)

    • Directed Evolution of a Selective and Sensitive Serotonin Sensor via Machine Learning

      2020, Cell
      Citation Excerpt :

      A high-resolution structure of ligand-bound iSeroSnFR could reinvigorate this process, but we have been as yet unable to obtain such a structure. Alternatively, the addition of more biophysical parameters to the model, or more advanced ML models such as universal transformers (Dehghani et al., 2019), Bayesian optimization (Yang et al., 2019b), or neural networks (Kato et al., 2005), could extract sequence/function relationships that we missed. On a related note, it will be broadly useful for the field to somehow incorporate ML-gleaned insights back into the biophysical potential functions underlying structure-based computational protein design.

    • Structure-based drug design to augment hit discovery

      2011, Drug Discovery Today
      Citation Excerpt :

      Although this technique is very efficient and powerful in screening compounds of interest, it consumes a lot of time and materials to perform experimental studies for huge combinatorial space (i.e. cost is high). Further, with the increased size of a screening library the efficiency of HTS tends to decrease [20]. Hence, employing alternative hit identification approaches that can handle varieties of biological targets effectively and identify pharmacologically sound hits becomes inevitable [21].

    • Lipases from Extremophiles and Potential for Industrial Applications

      2007, Advances in Applied Microbiology
      Citation Excerpt :

      Kinetic analysis indicated that a majority of the obtained enzyme variants either retained or surpassed wild‐type activity on a series of standard substrates. Kato et al. (2005) established a strategy for exploring functional proteins associated with computational analysis by using fuzzy neural network (FNN). FNN, a type of artificial neural network, automatically constructs complex model structures by learning the hidden relationship between input and output data, and it functions as a predictor.

    View all citing articles on Scopus
    View full text