Main

Telomeres are essential for genome stability and faithful chromosome replication. The chromatin structures associated with telomeric DNA mediate the many biological activities associated with telomeres, including cell-cycle regulation, cellular ageing, movement and localization of chromosomes within the nucleus, and transcriptional regulation of subtelomeric genes1,2. Specialized functions involving telomeric and subtelomeric DNA have evolved in several eukaryotes. For example, frequent subtelomeric gene conversion provides diversity for surface antigens in trypanosomes3, and rapidly evolving subtelomeric gene families confer selective advantages for closely related yeast strains4.

Human telomeres end with a stretch of the conserved simple repeat sequence (TTAGGG)n5. This tract is present at the end of all telomeres and therefore cannot be used to distinguish one telomere from another. To capture single-copy human DNA regions linked to telomeres that are useful for this purpose, we isolated large telomere-terminal fragments of human chromosomes using specialized yeast artificial chromosome (YAC) cloning vehicles called half-YACs6. Each half-YAC clone contains a large segment of subtelomeric DNA flanked by the cloning vector sequence at one end and the human telomere repeat sequence, which has been modified to operate as a functional yeast telomere in vivo, at the other. Characterization of these clones revealed low-copy subtelomeric repeats adjacent to the (TTAGGG)n sequence6,7. Physical mapping experiments on a large group of these half-YAC clones showed that, in most cases, they can stably maintain faithful copies of human telomere-terminal DNA fragments in yeast8. By contrast, bacterial artificial chromosome (BAC) libraries used to prepare the human working draft sequence are not expected to contain sequences extending to the telomere, owing to the absence of restriction sites in (TTAGGG)n, the effects of length associated with the construction of size-selected DNA recombinant clones, and the genomic instability of these regions9.

We used a combination of chromosome-specific single-copy sequences derived from the half-YAC clones and DNA end sequence derived from cosmid subclones of the half-YACs to connect most telomeres to the working draft sequence (Fig. 1). Our results show that the working draft sequence includes remarkably good coverage of human telomere regions. For the 24 human chromosomes, we analysed 46 telomere ends in all. The telomeres of the sex chromosome pair X and Y recombine meiotically, so these four telomeres are treated as two (designated the Xp/Yp pseudoautosomal telomere and the Xq/Yq pseudoautosomal telomere). We could integrate the working draft sequence with 32 telomere regions captured by half-YAC clones (blue dots). Of these 32 regions, 18 have working draft sequence coverage that includes DNA less than 50 kilobases (kb) from the telomere; for five of them, the sequence extends to the terminal (TTAGGG)n sequences10,11,12,13 (see Supplementary Information). Although we were unable to capture two telomeres (5p and 20q) in half-YAC clones, we identified these regions in subtelomeric repeat-containing BAC clones and used them to connect to the working draft sequence (green dots).

Figure 1: Summary of integration of telomeric DNA with working draft sequence.
figure 1

The two human pseudoautosomal telomere pairs (Xp/Yp and Xq/Yq) each recombine meiotically, so each pair is treated as a single telomere. Blue: the working draft sequence extends into these 32 telomere regions defined by half-YAC clones. Green: the working draft sequence ends within these subtelomeric repeat regions, but the distance to the molecular telomere is not known. Red: the telomeric DNA has not yet been integrated with working draft sequence. Telomere clones for the five acrocentric short-arm regions (black rectangles) have not been characterized.

We were unable to connect 7 of the remaining 12 telomere ends to the working draft, either because the working draft sequence does not yet extend into these regions (2q, 7p, 17p, 17q and Xp/Yp) or because unambiguous identification of overlapping working draft sequence was prevented by repeat sequences in the telomere clones (19p and 19q). BAC or cosmid clones connected to each of these seven telomeres were identified during construction of the fingerprint-based clone map of the human genome14 (http://genome.wustl.edu/gsc/human/Mapping/index.shtml) and this is likely to facilitate the future integration of these telomeres into the working draft (see Methods). The five acrocentric chromosomes (13, 14, 15, 21 and 22) contain heterochromatic short arms comprising repeated DNA. We did not analyse these five short-arm telomeres (black rectangles) because their sequences were unstable in both yeast and bacteria, rendering them difficult to clone and characterize.

We have made available a detailed summary of the mapping experiments integrating telomeres with the working draft sequence, including specific telomere reagents, working draft contig designations, individual BAC clone accessions and accession numbers for our half-YAC-derived sequences (see Supplementary Information). For some chromosome ends, such as that of 11p (Fig. 2), we could precisely estimate the distance between the end of the working draft sequence and the telomere. However, this was not the case for many telomeric regions because much of the working draft sequence is still in small, unordered pieces.

Figure 2: Connecting the 11p telomere to the working draft sequence.
figure 2

The end of the working draft sequence is represented by fragment AF015416 (magenta rectangle; a constitutent of working draft contig 18272). The half-YAC clone yRM2209 (black) is represented below. Cosmid subclones were derived from the half-YAC clone and the contig of these subclones, which encompasses most of the half-YAC clone, was aligned by EcoRI restriction sites (small vertical marks). The cosmid end sequences (yellow triangles) were screened to the working draft sequence to orient the clones relative to the 11p telomere. This physical mapping enabled us to determine the size and features of the 45-kb telomeric region missing from the working draft sequence. Features include a 25-kb region of subtelomeric repeat DNA (green), an internal (TTAGGG)n telomere repeat sequence (red) and four Unigene clusters of ESTs (blue dots). Within the 140-kb region extending from the telomere, we determined the location of three known genes, IFITM2 (an interferon-induced transmembrane protein), PSMD13 (a component of the proteasome 26S subunit) and SIRT3 (a Sir-2-like histone deactylase implicated in telomere maintanance), as well as the positions of unigene clusters of ESTs that match 11ptel sequences and are distinct from known genes (blue dots).

As part of this study, around 1.1 megabases (Mb) of half-YAC-derived DNA sequence was acquired from cosmid end sequencing as well as from draft and finished sequencing of some subtelomeric cosmids (see Supplementary Information). In addition to defining the overlap relationship between the working draft sequence and the telomere clones, the half-YAC-derived sequences were useful for sampling the subtelomeric regions not yet included in the working draft sequence but captured in the half-YAC clones. Preliminary analysis of the half-YAC-derived sequences and the regions of the working draft sequence that overlapped with the half-YACs revealed several interesting features.

The sizes of subtelomeric repeat regions adjacent to the terminal (TTAGGG)n varied widely among individual telomeres, from 8 kb at the 7q telomere to 300 kb at the 8p telomere. Large variations in subtelomeric repeat content have been detected near at least 18 telomeres8,15,16. Nonetheless, the scale of human genomic subtelomeric repeat content is now well defined, and it is clear that a significant part of the subtelomeric repeat region of the human genome is present in the working draft sequence.

Large subtelomeric repeat regions can cause false linkages in the BAC map and misassembly of working draft sequence. Large stretches of low-copy repeat DNA from subtelomeric repeat regions also localize to some pericentric chromosome regions, to the short-arm heterochromatin of acrocentric chromosomes and to a few loci in internal regions of chromosomes (for example, 1q42, 2q31, 4q28, 12p12 and Yq11.2). In previous iterations of the BAC map there were many instances of incorrect merges of subtelomeric repeat-containing BACs. To help identify these potentially problematic regions of the BAC map and working draft sequence, we have catalogued individual BAC clones containing segments of similarity with subtelomeric repeat regions (http://www.wistar.upenn.edu/Riethman). Inconsistencies between the current version of the BAC accession map (http://genome.wustl.edu:8021/pub/gsc1/fpc_files/freeze_2000_10_07/MAP/) and our telomere mapping studies are indicated in the Supplementary Information.

The abundance of low-copy repeat regions near telomeres is likely to make whole-genome shotgun assembly of subtelomeric regions virtually impossible. Indeed, previously characterized Drosophila subtelomeric repeat sequences are absent from its genome sequence17. By contrast, the entire sequence of yeast telomeric and subtelomeric regions was acquired using the half-YAC cloning strategy employed here18.

Internal telomere-like sequences, each consisting of around 50–250 base pairs (bp) of a mixture of perfect and imperfect copies of (TTAGGG)n11, were present in all subtelomeric repeat regions analysed. For example, multiple copies of internal telomere-like sequences were present in widely spaced parts of the 100-kb 18p subtelomeric repeat region, and were present in both orientations relative to the telomere. It is interesting to speculate that packaging of subtelomeric chromatin might involve interactions between the terminal (TTAGGG)n repeats and these internal telomere sequences. The TRF1 protein, which binds to (TTAGGG)n in vivo and can bind sequences corresponding to the short internal repeats in vitro19, would be a good candidate for mediating such interactions.

Preliminary analysis of the potential gene content of the subtelomeric regions encompassed by the half-YAC-derived sequences and the overlapping portions of the subtelomeric working draft sequence was done by searching for sequence matches between the genomic DNA sequences and potential gene-derived complementary DNA and expressed sequence tag (EST) sequences in GenBank (http://www.wistar.upenn.edu/Riethman). Even this preliminary analysis reveals two features of subtelomeric regions. First, there are many sequence matches with genes and ESTs in most subtelomeric regions. We detected about 500 matches to transcripts identified by either a full-length cDNA or by a unigene cluster of expressed sequences in the 40 telomere regions analysed; 62 of these were found from half-YAC sequences mapping distal to the working draft sequence. Second, many of the genes and potential genes identified by sequence matches are members of gene families with many pseudogene copies. The sequence matches included around 100 known genes, both unique and members of gene families.

Human subtelomeric sequences have been proposed to serve as a buffer between the terminal (TTAGGG)n sequences, which are needed to protect chromosome ends from fusion and recombination, and vital internal chromosomal sequences15. However, the many expressed sequences throughout subtelomeric regions, extending almost to the molecular telomere, suggest that these regions may serve essential functions and are not simply dispensable junk DNA.

Methods

We used a range of half-YAC-derived probes, including PCR- and cosmid subclone-derived probes and sequences (see Supplementary Information) and sets of collaboratively derived subtelomeric molecular and cytogenetic markers for specific telomeres20,21,22, to connect specific cloned chromosome ends with flanking BAC contigs, either by DNA hybridization and PCR experiments or by computer-based matches (using BLAST223 sequence alignment programs) of sequenced subtelomeric DNA with working draft sequence.

Single-copy probes from three of the seven telomeres not connected to working draft sequence could be used to identify BAC clones from an 11× coverage RP11 BAC library, although fewer clones than expected were identified (singleton BACs from the 2q and the 17p telomeres, and three BACs from the 7p telomere). Low-copy repeat sequences at the 19p, 19q and 17q telomeres complicated attempted BAC library screens for these chromosome ends, but independent experiments have identified PAC and BAC clones connected to the 17q telomere22 and a detailed physical map of chromosome 19 (http://greengenes.llnl.gov/genome/) exists to help guide closure of the 19p and 19q subtelomeric gaps, which occur in duplicated regions containing a family of zinc finger-encoding genes. The remaining telomere region (Xp/Yp) is encompassed by a 500-kb clone contig extending to within a few kb of the telomere24.

Physical mapping experiments using a site-specific cleavage method (RARE cleavage8,25) have been done for 21 telomeres to demonstrate co-linearity of the half-YAC insert DNA with the cognate telomere. In the absence of RARE cleavage data, the presence of subtelomeric repeats adjacent to terminal (TTAGGG)n sequences in all of the designated half-YAC clones is taken as strong evidence for proximity to the telomere; this has been borne out by the RARE cleavage experiments carried out so far.

Half-YAC clones containing chromosome-specific DNA were not recovered from four chromosome ends. BAC and cosmid clones identified by virtue of their subtelomeric repeat content form the initial basis for the telomere linkages to 5p, 20q, 19q and Xp/Yp. The BAC clones used to mark the 5p and 20q telomeres and the cosmid used to mark the 19q telomere each contain an internal telomere repeat sequence and subtelomeric repeat sequences, and localize to telomeric ends of the BAC map (5p, 20q) and the chromosome 19 physical map (http://greengenes.llnl.gov/genome/). On the basis of the known sequence organization of other telomeres, only additional subtelomeric repeat sequence is likely to reside distal to the subtelomeric repeat segments contained in these clones, although the possibility of single-copy DNA distal to them cannot be formally excluded at present. A cosmid clone mapped to the Xp/Yp pseudoautosomal telomere using Bal31 exonuclease experiments26 forms the telomeric boundary of a large cosmid contig24 whose sequence is not yet available.