Journal of Molecular Biology
Sequence Variability Analysis of Human Class I and Class II MHC Molecules: Functional and Structural Correlates of Amino Acid Polymorphisms
Introduction
Classical major histocompatibility complex (MHC) molecules are cell-surface glycoproteins that are central to the process of adaptive immunity, functioning to capture and display peptides on the surface of antigen-presenting cells (APCs).1 These plasma membrane-bound peptide–MHC complexes (pMHC) are scrutinized by T-lymphocytes via their T-cell receptors (TCRs) during immunosurveillance.2 Since T-cells recognizing self-peptides are eliminated during the process of thymic selection, those pMHC incorporating foreign peptides are the primary focus of T-cell-mediated immune responses.3
There are two major classes of MHC molecules, termed class I and class II.4 MHC class I (MHCI) molecules are expressed on most cells, bind endogenously derived peptides with sizes ranging from eight to ten amino acid residues and are recognized by CD8 cytotoxic T-lymphocytes (CTL). On the other hand, MHC class II (MHCII) are present only on specialized APCs, bind exogenously derived peptides with sizes varying from 9 to 22 residues, and are recognized by CD4 helper T-cells. These differences indicate that MHCI and MHCII molecules engage two distinct arms of the T-cell-mediated immune response, the former targeting invasive pathogens such as viruses for destruction by CD8 CTLs, and the latter inducing cytokine-based inflammatory mediators to stimulate CD4 helper T-cell activities including B-cell activation, maturation and antibody production.5
Sequence identity between classical MHCI and MHCII is low (<20%), yet their 3D structures are strikingly similar. MHCI molecules are heterodimers composed of a single-membrane-spanning α chain, paired with the soluble β2 microglobulin (β2m) protein. The α chain has been divided into three distinct segments termed α1, α2 and α3. The α3 region has an immunoglobulin (Ig)-like fold, whereas the membrane distal α1 and α2 segments form a peptide-binding cleft consisting of two α-helices overlying a floor comprised of eight antiparallel β-stranded sheets (α1α2 domain). MHCII molecules are heterodimers also but comprised of two subunits, each with a single membrane-spanning anchor. The extracellular segment of the α chain includes α1 and α2 domains, and likewise, the β chain is composed of β1 and β2 domains. In the mature MHCII molecule, the α2 and β2 domains fold as independent Ig-like domains, whereas α1 and β1 fold together, creating a single antigen-presenting platform with architecture very similar to that of the α1α2 domain in MHCI molecules.6., 7.
In the human, MHC molecules are referred to as HLA, an acronym for human leukocyte antigens, and are encoded by the chromosome 6p21.3-located HLA region.8., 9. The HLA segment is divided into three regions (from centromere to telomere), class II, class III and class I. Classical class I and class II HLA genes are contained in the class I and class II regions, respectively, whereas the class III locus bears genes encoding proteins involved in the immune system but not structurally related to MHC molecules. The classical HLA class I molecules are of three types, HLA-A, HLA-B and HLA-C. Only the α chains of these mature HLA class I molecules are encoded within the class I HLA locus by the respective HLA-A, HLA-B and HLA-C genes. In contrast, the β2m chain encoded by the β2m gene is located on chromosome 15. The classical HLA class II molecules are also of three types (HLA-DP, HLA-DQ and HLA-DR), with both the α and β chains of each encoded by a pair of adjacent loci. In addition to these classical HLA class I and HLA class II genes, the human MHC locus includes a long array of HLA pseudogenes as well as genes encoding non-classical MHCI and MHCII molecules.9 HLA-pseudogenes are an indication that gene duplication is the main driving force for HLA evolution,10 whereas non-classical MHCI and MHCII molecules often subserve a restricted function within the immune system quite distinct from that of antigen presentation to αβ TCRs.11., 12. Furthermore, in contrast to the classical HLA molecules (with the exception of the somewhat polymorphic MICA and MICB genes), non-classical HLA molecules are composed of a very limited number of allelic variants.
Genes encoding classical HLA molecules are extremely polymorphic, such that most classical HLA genes include a large number of allelic variants. Thus, the HLA IMGT/HLA database†13 currently includes 1524 HLA allelic sequences (904 HLA I alleles and 620 HLA II alleles) (release 1.16, 14/10/2002). The basis for this extreme level of allelic diversity has been linked to evolving optimization of immune protection against pathogens. Consistent with this notion, most polymorphisms are associated with the peptide-binding residues of the α1α2 and α1β1 domains of HLA class I and HLA class II molecules, respectively.14 Moreover, the rate of non-synonymous substitutions (those that result in amino acid changes) is greater than the rate of synonymous substitutions (those that do not change amino acid sequence) in codons specifying amino acids involved in peptide binding.15., 16. In addition, population studies of HLA class I genes indicate that codons involved in peptide binding have greater heterozygosity than those with other functions.17 Reproductive selection mechanisms may contribute toward shaping the HLA polymorphisms.18
HLA molecular variability studies at an amino acid level are limited,19 as most HLA polymorphism analyses have been performed at the nucleotide level. Thus, comprehensive analysis of HLA amino acid variability in the context of the ever-increasing number of pMHC structures, some in complex with cognate TCRs, is missing. In this regard, we have carried out a detailed analysis of the amino acid sequence variability in each of the six distinct classical HLA molecules (HLA-A, HLA-B, HLA-C, HLA-DP, HLA-DQ, and HLA-DR) using a variability metric given by the Shannon entropy equation.20 We have combined this variability metric with sequence contact maps of the MHC binding to peptides, TCRs and CD8 and CD4 co-receptors obtained from the analysis of the available X-ray crystallographic structures. Interestingly, the pattern of variability for every MHC molecule distinguishes MHCI and MHCII and defines characteristic “fingerprints” for each subclass. The relevance of these distinct patterns of variability in MHC peptide-binding specificity, as well as TCR restriction and alloreactivity is discussed. Sequence variability analysis of the peptide binding region of HLA molecules and their mapping onto the relevant 3D structures is available for visualization at the Molecular Immunology Foundation web site‡.
Section snippets
HLA polymorphisms
HLA genes are extremely polymorphic, with most genes consisting of a large number of allelic variants (Table 1) specifying differences at the amino acid level and fine structural detail as revealed by X-ray crystallography comparison of PDBs (Table 2). The number of HLA allelic variants that diverge in at least one amino acid residue varies for the individual HLA genes, being greatest for HLA-B and DRB1 genes with 447 and 271 variants, respectively (Table 1). At the other extreme is the HLA-DRA
MHC sequences and multiple sequence alignments
Amino acid sequences of the human MHCI and MHCII alleles were collected from the IMGT/HLA database.13 Amino acid sequences were collected from functional genes with sequences from pseudogenes being discarded. Thus, amino acid sequences of the α and β chains of the HLA-DP molecule (HLA-DPA, and HLA-DPB, respectively) were obtained from the HLA-DPA1 and HLA-DPB1 genes. Likewise, the HLA-DQA1 and HLA-DQB1 genes were the only sources of the amino acid sequences for the α and β chains of the HLA-DQ
Acknowledgements
This work was supported by NIH grant AI50900 and the Molecular Immunology Foundation. We acknowledge the excellent scientific input provided by Dr Alfonso Valencia and comments from Drs Jia-huai Wang, Bruce Reinhold and Linda Clayton.
References (69)
Interactions of TCRs with MHC–peptide complexes: a quantitative basis for mechanistic models
Curr. Opin. Immunol.
(1997)Positive and negative selection of the αβ T cell repertoire in vivo
Curr. Opin. Immunol.
(1991)- et al.
MHC superfamily structure and the immune system
Curr. Opin. Struct. Biol.
(1999) - et al.
The differentiation and function of human T lymphocytes: a review
Cell
(1980) - et al.
Antigen peptide binding by class I and class II histocompatibility proteins
Structure
(1994) - et al.
Functions of non-classical MHC and non-MHC-encoded class I molecules
Curr. Opin. Immunol.
(1999) Function and polymorphism of human leukocyte antigen-A,B,C molecules
Am. J. Med.
(1988)- et al.
HLA-DR and -DQ epitopes and monoclonal antibody specificity
Immunol. Today
(1989) - et al.
A Shannon entropy analysis of immunoglobulin and T cell receptor
Mol. Immunol.
(1997) - et al.
Structural basis of CD8 co-receptor function revealed by crystallographic analysis of a murine CD8αα ectodomain fragment in complex with H-2Kb
Immunity
(1998)