In Silico Prediction of the Peroxisomal Proteome in Fungi, Plants and Animals

https://doi.org/10.1016/S0022-2836(03)00553-9Get rights and content

Abstract

In an attempt to improve our abilities to predict peroxisomal proteins, we have combined machine-learning techniques for analyzing peroxisomal targeting signals (PTS1) with domain-based cross-species comparisons between eight eukaryotic genomes. Our results indicate that this combined approach has a significantly higher specificity than earlier attempts to predict peroxisomal localization, without a loss in sensitivity. This allowed us to predict 430 peroxisomal proteins that almost completely lack a localization annotation. These proteins can be grouped into 29 families covering most of the known steps in all known peroxisomal pathways. In general, plants have the highest number of predicted peroxisomal proteins, and fungi the smallest number.

Introduction

Peroxisomes, along with glyoxysomes of plants and glycosomes of trypanosomes, belong to the microbody family of organelles. These three types of microbodies exist in different cellular environments and possess distinct specialized functions. They house an important set of enzymes within their single membrane, and at the very least they all contain one hydrogen-peroxide-producing oxidase and a catalase to decompose the hydrogen peroxide.1 Peroxisomes contain enzymes involved in lipid metabolism, such as β-oxidation of fatty acids, synthesis of cholesterol, bile acids and plasmalogens in mammals, in the glyoxylate cycle in plants, in methanol oxidation in yeasts,2., 3. and part of the glycolytic pathway in kinetoplastid parasites.4

The importance of the peroxisome is underscored by the existence of numerous human genetic disorders associated with peroxisomal defects. Lack of single peroxisomal enzymes is the cause for several human diseases.5., 6. However, the most severe peroxisomal disorders originate from defects in peroxisome biogenesis, with the simultaneous loss of several metabolic functions. These disorders, known as the peroxisomal biogenesis disorders (PBDs), such as Zellweger syndrome, are genetically heterogeneous with 12 known complementation groups.7

The biogenesis of peroxisomal matrix proteins is fairly well understood.8., 9., 10. Both the protein targeting and import mechanism into microbodies and the components required for peroxisomal biogenesis are evolutionarily conserved.11., 12. Peroxisomal proteins are nuclear-encoded and synthesized in the cytosol on free polyribosomes.1 Peroxisomes acquire their matrix proteins by post-translational import from the cytosol via two pathways that rely on two kinds of conserved peroxisomal targeting signals (PTS). The majority of peroxisomal matrix proteins have a PTS1 at their extreme carboxyl terminus, consisting of just three amino acids—SKL—or a conservative variant thereof.8., 11. A few peroxisomal enzymes (malate dehydrogenase, citrate synthase, acyl-CoA oxidase and 3-ketoacyl-CoA-thiolase) are known to use a different targeting signal, the amino-terminally located PTS2, which is a bipartite signal with the consensus sequence [RK]-[LVI]-x5-[HQ]-[LA].13

Although most peroxisomal matrix proteins use PTSs for their targeting, there are a few proteins that lack a canonical targeting signal and that might enter the peroxisomal matrix by “piggybacking” on other proteins bearing PTSs.14., 15.

The peroxisomal protein import machinery requires around 20 PEX genes and their products, the peroxins.8 PTS1 interacts with the tetratricopeptide repeats (TPRs) of the receptor Pex5p.16 Proteins bearing PTS2 bind to the WD40 motifs of Pex7p.17 After binding of their cargo proteins, these receptors are thought to interact with a docking complex consisting of Pex13p and Pex14p, where both pathways seem to merge18., 19. and then shuttle back into the cytosol for the next round of targeting. At present, several peroxins involved in the protein import machinery have been characterized, however, little is known about the principles of the translocation process.10

Considering the biological and medical importance of the peroxisome, methods for identifying peroxisomal proteins from the amino acid sequence is an important challenge in bioinformatics. Previous attempts to predict peroxisomal localization based on amino acid sequence include PSORT,20., 21. a knowledge-based predictor using a decision tree to sort proteins among several different compartments. In PSORT, the PTS1 motif [AS]-[HKR]-L is used as a marker for peroxisomal location along with amino acid composition over the entire protein. The performance on peroxisomal proteins is modest, in the sense that many peroxisomal proteins are missed. Cai et al.22 applied a support vector machine (SVM) to predict protein localization based on both amino acid composition and sequence. Though the overall performance was fair, the results for the peroxisomal subset was poor. Geraghty et al.23 used a pattern-based method to scan the Saccharomyces cerevisiae ORFs for potential peroxisomal proteins. Including both PTS1 and PTS2 motifs in their search, they found 18 new potential peroxisomal proteins. GFP fusions allowed them to confirm that about half of these proteins were truly located in the peroxisome. Another way to predict PTS1-containing protein is to use the PROSITE24., 25. pattern, [ACGNST]-[HKR]-[AFILMVY], for microbody C-terminal targeting signals, but this pattern also finds many non-peroxisomal proteins.

For lack of data on PTS2 proteins, we have chosen to focus on proteins carrying the C-terminal PTS1. In this work, we have (i) constructed an amino acid sequence based predictor for PTS1-mediated peroxisomal targeting, including both a motif identification step and a machine learning module, (ii) coupled our predictor with the subcellular localization predictors TargetP and TMHMM to improve performance, (iii) applied our prediction scheme on the predicted proteins (ORFs) of eight eukaryotic genomes, (iv) searched for and clustered homologs between the predicted sets from these eight genomes in order to reinforce localization prediction in a manner inspired by phylogenetic profile analysis,26., 27., 28. and finally (v) expanded the clusters by searching for proteins with domain composition identical with the proteins in the clustered sets.

By combining these different approaches, we identify a set of strongly predicted peroxisomal proteins from eight eukaryotic genomes. This set enables us to make cross-genomic comparisons and to make an initial guess at the contents of the different peroxisomal proteomes.

Section snippets

The PTS1 motif and the data sets

As described in Methods, we initially extracted 152 peroxisomal proteins with a true PTS1 as well as 308 non-peroxisomal proteins with a PTS1-like C-terminal tripeptide from Swiss-Prot.29 It cannot be ruled out that PTS1 from animal, plants and fungi could show organelle specificity within the microbody family, however, the data sets would be considerably reduced by additional subdivision into glyoxysomal and peroxisomal proteins. Also, it has been shown that glyoxysomal proteins can be

Predicting the Peroxisomal Proteome in Silico

Recently, attempts to systematically identify peroxisomal proteins have been carried out in different model organisms. In greening cotyledons of Arabidopsis thaliana, 53 proteins were analyzed by MALDI-TOF mass spectrometry and 29 were identified.39 In S. cerevisiae, 19 soluble proteins were identified from 1D-gel electrophoresis.40 Using 2D-gels from liver and kidney tissues from Mus musculus we have separated approximately 70 proteins (S.C., unpublished data). Taking all this into account, a

Biological Implications

Our prediction method has succeeded in extracting the majority of peroxisomal proteins described in literature, moreover, several novel proteins with possible roles in peroxisomal biochemistry have additionally been found. In this final section, we analyse the predictions for several peroxisomal pathways in some more detail.

Conclusions

We have developed a scheme for predicting peroxisomal localization of proteins, using information from the sequence in the form of the PTS1 motif and a machine learning-based module that analyses the PTS1-adjacent region. The PeroxiP predictor alone seems to work better than previous attempts at constructing peroxisomal localization predictors. The main difficulty of predicting PTS1-targeted peroxisomal proteins is that PTS1-like C-terminal tripeptides are found in many non-peroxisomal

Data sets for predictor construction

Sequence data were collected from SWISS-PROT release 39.27.29 PTS1-containing sequences were searched for among the entries containing the annotation “SUBCELLULAR LOCATION: *{PEROXISOMAL|GLYOXYSOMAL|GLYCOSOMAL}” using the label “MICROBODY TARGETING SIGNAL” where found, or otherwise including sequences with clear annotation about peroxisomal location and with a C-terminal tripeptide similar to any confirmed PTS1. One hundred and fifty-six sequences were found this way. In a manual control, four

Acknowledgements

This project were supported by grants from The Swedish Research Council (G.vH., A.E. and S.C.), The Swedish Foundation for Strategic Research (A.E. and G.vH.) and the Carl Trygger foundation (S.C. and A.E.).

References (64)

  • H Hayashi et al.

    A novel acyl-CoA oxidase that can oxidize short-chain acyl-CoA in plant peroxisomes

    J. Biol. Chem.

    (1999)
  • M.F Lensink et al.

    Response of SCP-2L domain of human MFE-2 to ligand removal: binding site closure and burial of peroxisomal targeting signal

    J. Mol. Biol.

    (2002)
  • W.L Kovacs et al.

    Central role of peroxisomes in isoprenoid biosynthesis

    Prog. Lipid Res.

    (2002)
  • S.J Perantonis et al.

    Efficient perceptron learning using constrained steepest descent

    Neural Netw.

    (2000)
  • P.B Lazarow et al.

    Biogenesis of peroxisomes

    Annu. Rev. Cell Biol.

    (1985)
  • H van den Bosch et al.

    Biochemistry of peroxisomes

    Annu. Rev. Biochem.

    (1992)
  • R.J Wanders et al.

    Peroxisomal fatty acid α and β-oxidation in humans: enzymology, peroxisomal metabolite transporters and peroxisomal diseases

    Biochem. Soc. Trans.

    (2001)
  • R.J.A Wanders et al.

    The Peroxisome

  • R.J.A Wanders et al.

    The Peroxisome

  • S.G Gould et al.

    The Peroxisome

  • S Subramani et al.

    Import of peroxisomal matrix and membrane proteins

    Annu. Rev. Biochem.

    (2000)
  • V.I Titorenko et al.

    The life cycle of the peroxisome

    Nature Rev. Cell Biol.

    (2001)
  • S.J Gould et al.

    Peroxisomal-protein import: is it really that complex?

    Nature Rev. Cell Biol.

    (2002)
  • S.J Gould et al.

    A conserved tripeptide sorts proteins to peroxisomes

    J. Cell Biol.

    (1989)
  • S Subramani

    Protein import into peroxisomes and biogenesis of the organelle

    Annu. Rev. Cell Biol.

    (1993)
  • B.W Swinkels et al.

    A novel, cleavable peroxisomal targeting signal at the amino-terminus of the rat 3-ketoacyl-CoA thiolase

    EMBO J.

    (1991)
  • P.E Purdue et al.

    Peroxisome biogenesis

    Annu. Rev. Cell Dev. Biol.

    (2001)
  • V.I Titorenko et al.

    Acyl-CoA oxidase is imported as a heteropentameric, cofactor-containing complex into peroxisomes of Yarrowia lipolytica

    J. Cell Biol.

    (2002)
  • G.J Gatto et al.

    Peroxisomal targeting signal-1 recognition by the TPR domains of human PEX5

    Nature Struct. Biol.

    (2000)
  • E.J Neer et al.

    The ancient regulatory-protein family of WD-repeat proteins

    Nature

    (1994)
  • W Girzalsky et al.

    Involvement of Pex13p in Pex14p localization and peroxisomal targeting signal 2-dependent protein import into peroxisomes

    J. Cell Biol.

    (1999)
  • P Horton et al.

    Better prediction of protein cellular localization sites with the k nearest neighbors classifier

    ISMB

    (1997)
  • Cited by (96)

    • Characterization, prediction and evolution of plant peroxisomal targeting signals type 1 (PTS1s)

      2016, Biochimica et Biophysica Acta - Molecular Cell Research
      Citation Excerpt :

      The capability of importing newly evolved peroxisomal cargo of lowest PEX5 affinity into peroxisomes is difficult to envisage in a model in which proteins with strong PTS1s are constantly synthesized and saturate the import machinery. Since the peroxisomal proteome size of plant peroxisomes appears to pass that of other eukaryotes [35,147] and the number of currently known non-canonical PTS1 proteins is particularly high in Arabidopsis, new PTS1s might be able to evolve in spermatophytes more easily as compared to other eukaryotes. Since global protein synthesis is regulated by diurnal rhythms in plants, PEX5 might be less saturated with high-affinity PTS1 cargo during the night, which enables binding and peroxisomal import of low-affinity PTS1 cargo and the evolution of new PTS1 proteins by random point mutations [112].

    • The glyoxylate cycle is involved in pleotropic phenotypes, antagonism and induction of plant defence responses in the fungal biocontrol agent Trichoderma atroviride

      2013, Fungal Genetics and Biology
      Citation Excerpt :

      Analysis of N-terminal regions for mitochondrial targeting sequences was analysed using MITOPROT (Claros and Vincens, 1996) and Predotar (Small et al., 2004). Presence of peroxisomal signal was done by PTS1 (Neuberger et al., 2003) and PeroxiP (Emanuelsson et al., 2003). Subcellular localization was further analysed by the Euk-mpLoc2.0 (Chou and Shen, 2010) and CELLO (Yu et al., 2006) predictor tools.

    View all citing articles on Scopus
    View full text