Abstract

Spinocerebellar ataxia type 8 (SCA8) is a neurodegenerative disorder caused by the expansion of a CTG trinucleotide repeat that is transcribed as part of an untranslated RNA. As a step towards understanding the molecular pathology of SCA8, we have defined the genomic organization of the SCA8 RNA transcripts and assembled a 166 kb segment of genomic sequence containing the repeat. The most striking feature of the SCA8 transcripts is that the most 5′ exon is transcribed through the first exon of another gene that is transcribed in the opposite orientation. This gene arrangement suggests that the SCA8 transcript is an endogenous antisense RNA that overlaps the transcription and translation start sites as well as the first splice donor sequence of the sense gene. The sense transcript encodes a 748 amino acid protein with a predicted domain structure typical of a family of actin-organizing proteins related to the DrosophilaKelch gene, and so has been given the name Kelch-like 1 (KLHL1). We have identified the full-length cDNA sequence for both the human and mouse KLHLI genes, and have elucidated the general genomic organization of the human gene. The predicted open reading frame and promoter region are highly conserved, and both genes are primarily expressed in specific brain tissues, including the cerebellum, the tissue most affected by SCA8. Transfection studies with epitope-tagged KLHL1 demonstrate that the protein localizes to the cytoplasm, suggesting that it may play a role in organizing the actin cytoskeleton of the brain cells in which it is expressed.

Received 2 February 2000; Revised and Accepted 14 April 2000.

INTRODUCTION

We recently cloned a CTG expansion mutation that causes the dominantly inherited neurodegenerative disease spinocerebellar ataxia type 8 (SCA8) (1). We demonstrated that the SCA8 transcript is transcribed through the repeat only in the CTG orientation, as is the case for myotonic dystrophy (DM) (2), and not in the CAG orientation, as is found with the other dominantly inherited ataxias SCA1, SCA2, SCA3, SCA6 and SCA7 (3,4). In these latter diseases, the CAG expansion is translated into a polyglutamine tract that adds a toxic gain of function to the respective proteins, whereas the CTG expansions in DM and SCA8 are not translated. Despite intensive efforts to understand how the CTG mutation in the 3′ untranslated region (UTR) of the DMPK gene causes DM, a consensus has still not been reached as to why this mutation is pathogenic (5).

The clinical features of SCA8 are similar to those of the other SCAs and include limb and truncal ataxia, ataxic dys­arthria and horizontal nystagmus (1,6), all of which are signs of a disturbance of the cerebellar system. Magnetic resonance image analysis of SCA8 patients showed substantial atrophy of the cerebellar vermis and hemispheres and relative preservation of the brainstem and cerebral hemispheres (6), a phenotype that is essentially indistinguishable from that of SCA6.

The transcripts containing the SCA8 CTG repeat are alternatively spliced and polyadenylated, and are expressed primarily in various brain tissues (1). No extended open reading frames (ORFs) are present in any of the SCA8 splice variants that have been identified. During the isolation of the SCA8 transcript we unexpectedly found two partial cDNAs generated from mRNA transcribed in an orientation opposite to that of the SCA8 transcript. This gene arrangement suggests that the SCA8 transcript is an endogenous antisense RNA. We have now identified the full cDNA sequence of this overlapping transcript from both human and mouse and report that the conserved ORF is highly homologous to the Drosophila Kelch protein (7). We have therefore named this new gene Kelch-like 1 (KLHL1).

RESULTS

Genomic organization of the SCA8/KLHL1 region

The sequences of the SCA8 splice variants we isolated using rapid amplification of cDNA ends (RACE) procedures (8) indicated that the SCA8 transcript consists of either three or four exons (D-C-B-A or D-C-A) and appeared to have an alternative 5′ exon (D′-C-A) (1). In order to define the actual intron–exon genomic organization of the SCA8 gene, we used single site polymerase chain reaction (PCR) (9) to obtain and sequence genomic fragments spanning each of the predicted exon–intron junctions and used long-range genomic PCR to estimate the size of each intron. Working draft sequences of three bacterial artificial chromosome (BAC) clones (RPCI-11 121J6, 7O24 and 20M9) containing the SCA8 repeat have also recently become available through the Human Genome Project, and we have used these data in conjunction with our sequence data to assemble a 166.4 kb genomic sequence from the SCA8 region of chromosome 13. The genomic organization of the SCA8 transcripts derived from these data is summarized in Figure 1, and the intron–exon junctions are shown in Table 1.

The longest SCA8 cDNAs we have identified consist of five exons derived from a genomic DNA region slightly over 32 kb in length. A segment of contiguous cDNA sequence that had appeared to be a single exon (C) is actually spliced from two genomic exons (C1 and C2). The most 5′ exon in these cDNAs is derived from transcripts spliced at either of two alternative donor splice sites (D or D′). The genomic sequence between the D and D′ donor sequences is highly G/C rich and has proven to be a poor template for PCR, a fact that we believe explains our inability to obtain RACE products extending through this region. This G/C rich region has a high CpG content and may be a short CpG island. Because the genomic sequence immediately 5′ of the longest SCA8 cDNAs does not contain promoter elements, we believe that we have not yet identified the true 5′ end of the SCA8 transcript, and we are continuing our efforts to do so.

We performed a homology search (10,11) of the EST sequence database using the genomic SCA8 sequence and identified a single SCA8 cDNA sequence. This 3′ EST sequence matched the SCA8 genomic sequence 3′ of the exon B donor splice site. We obtained this cDNA clone (IMAGE 2067260, total fetal library), sequenced the complete cDNA insert and found that it was generated from a transcript that terminated at a ATTAAA polyadenylation site downstream of the exon B donor sequence. This SCA8 splicing variant, which does not contain the SCA8 CTG repeat region, utilized yet another alternative D donor splice site (D′′). The general organization of this cDNA is shown in Figure 1.

A single 5′ EST sequence generated from a mRNA transcribed in the opposite orientation to the SCA8 transcripts (Fig. 1) was also found in this database search. We were unable to obtain this KLHL1 cDNA clone (GEN-419D02, fetal brain library) for further sequencing or analysis. We were able to confirm that the KLHL1 transcript encompasses this EST sequence, however, by performing PCR on cerebellar cDNAs with a primer that bound 5′ of this sequence and a primer specific for the second KLHL1 exon (see Materials and Methods).

To obtain an accurate and complete sequence of the overlapping KLHL1 gene, we performed high-fidelity RT–PCR using primers from exon D of the SCA8 gene and a primer near the poly(A) end of a partial KLHL1 cDNA and sequenced the resulting products. We have used this sequence to establish a BAC contig that spans the genomic length of the KLHL1 gene and to identify two working-draft BAC sequences (RPCI-11 45C1 and 394C3) that contain the last six KLHL1 exons. Our sequence and BAC data indicate that the KLHL1 gene is composed of 11 exons and has a minimum genomic size of over 400 kb (Fig. 2), which is remarkably large for a gene for which the spliced mRNA is less than 4 kb in length. We have assembled over 114 kb of sequence from the first intron of the gene, and have estimated the size of this intron to be ∼140 kb.

We performed a homology search of the EST sequence database using the KLHL1 cDNA sequence and identified sequences from four KLHL1 cDNA clones. Three of these clones are from fetal brain cDNA libraries and one is from an infant brain library. One of these EST sequences has been used to develop the sequence tag site (STS) marker WI-6558 and has been assigned the UniGene identification number Hs.106808 (12).

The KLHL1 transcript and promoter are conserved in mouse

To obtain mouse KLHL1 cDNA and promoter sequence, we amplified portions of the mouse KLHLI coding region using both mouse genomic DNA and mouse brain cDNA as templates for low stringency PCR with primers designed from the human sequence. We then sequenced these products and designed new PCR primers based on mouse KLHL1 sequences. These mouse KLHL1 primers were used in PCR-based methods to generate both full-length mouse KLHL1 cDNA clones and genomic fragments 5′ of the gene (Materials and Methods).

Northern and dot-blot analyses indicate that human KLHL1 is expressed primarily in several brain tissues, including the cerebellum, substantia nigra, frontal lobe and medulla (unpublished data). To determine if the expression pattern of the mouse KLHL1 gene was similar to that of the human gene, we probed northern blots of poly(A)+ RNA isolated from the eight different human or mouse tissues with probes specific for human or mouse KLHL1, respectively (Fig. 3A and B). KLHL1 transcripts were detected in mRNA isolated from whole brain in both human and mouse, but was not detected in mRNA isolated from any of the other tissues analyzed. To further confirm this finding, we performed a KLHL1 PCR assay of normalized amounts of first-strand cDNA from various mouse adult tissues and total embryos (not shown). The mouse KLHL1 gene was once again detected only in the brain, and was present in the cDNA from total mouse embryo at detectable levels by day 11. Mouse KLHL1 sequences are present in the EST databases from two medulla oblongata cDNA clones, and from one diencephalon cDNA clone, indicating that the mouse gene is expressed in these brain tissues. We performed KLHL1 RT–PCR analysis of RNA isolated from a mouse cerebellum and determined that the KLHL1 is expressed in this tissue, as well (Fig. 3C).

A comparison of the human and mouse genomic sequences spanning the first KLHL1 exon is shown in Figure 4. We analyzed these sequences with promoter prediction programs (13,14) and found that the predicted KLHL1 core promoter is conserved in both the mouse and human sequence. The predicted transcription start site for KLHL1 is consistent with the fact that we were able to obtain RT–PCR products from mouse KLHL1 cerebellar RNA using primers designed to amplify a transcript beginning at the predicted start site (Fig. 3C, lanes 3 and 4), whereas primers designed to anneal as little as 14 nt 5′ of this site did not generate RT–PCR products (Fig. 3C, lanes 1 and 2).

Both the mouse and human KLHL1 sequences encode an ORF that begins at the same start codon. The nucleotide sequence conservation within this first exon ORF is significantly higher than in the 5′ UTR of the gene (85% nucleotide identity in the ORF, 62% identity in the 5′ UTR). The two genes share an identical splice donor sequence at the end of the first KLHL1 exon, and the sequence conservation drops considerably within the intron sequences.

In the antisense orientation, the SCA8 D′′ splice donor site is not conserved in mouse, the core GT nucleotides of the D splice donor site are conserved, and the full sequence of the D′ splice sequence is conserved. Conservation of this known SCA8 splice site in the mouse genome indicates that a KLHL1 antisense transcript may also be present in mouse, but we have not yet attempted to isolate this RNA.

The domain organization of KLHL1 indicates that it is an actin-binding protein

The predicted amino acid (aa) sequence of mouse and human KLHL1 protein is shown in Figure 5. The human protein has 748 aa and the mouse has 751 aa, with 95% of the aa conserved between the two proteins. The KLHL1 protein has a high degree of homology to the well characterized actin-binding Kelch protein from Drosophila (7). The Kelch protein is an actin-binding component of ring canals, which are required for cytoplasm transport from nurse cells to the oocyte during oogenesis. We have used the homology to Kelch as the basis for predicting a domain structure for the KLHL1 protein.

KLHL1 has the POZ (15) (also called BTB) (16) protein–protein dimerization domain present in Kelch and in a number of zinc finger proteins, and has the six Kelch motif repeats (KREPs) (17) that constitute the actin-binding domain of Kelch (7) and other Kelch-related proteins. The intervening aa sequence (IVS) between these two domains is also similar to that present in Kelch. The amino-terminal region (NTR) of KLHL1 is not homologous to any other known protein. The distribution of functionally conserved aa between the mouse and human KLHL1 proteins is reflected in this predicted domain structure, with 84% of the aa conserved in the NTR (166/196) but over 98% conserved aa in the POZ, IVS and KREP domains (106/108, 156/158 and 284/289, respectively).

The domain organization of KLHL1 is highly similar to the general domain structure of two other brain-specific, actin-binding proteins: NRPB (18) (also called ENC-1) (19) and KLHL2 (also called Mayven) (20). NRPB, which was identified as a specific molecular marker of neural induction in vertebrates (19), has been shown to participate in neuronal process formation and is believed to be a nuclear matrix protein. The KLHL2 protein is a cytoplasmic protein that is thought to play a role in the dynamic organization of the cytoskeleton of neurons.

KLHL1 is primarily a cytoplasmic protein

To determine the subcellular localization of the KLHL1 protein, we fused the full-length KLHL1 ORF to an amino-terminal epitope tag in an expression vector in which it is expressed from the human cytomegalovirus (CMV) immediate–early promoter. This construct was then transiently transfected into COS-1 cells, which were then grown for 2 days, immunostained using antibodies specific for the epitope tag, and analyzed with a confocal microscope system. The results of this experiment are shown in Figure 6. The fused KLHL1 was fairly evenly distributed throughout the cytoplasm of transfected cells, and was not detectable in the nucleus. This subcellular localization is similar to that seen with KLHL2 (20). Because these results were obtained with epitope-tagged KLHL1 protein, we plan to confirm that unaltered KLHL1 is also localized to the cytoplasm using antibodies to KLHL1 (in progress).

DISCUSSION

As a step towards understanding the molecular pathology of SCA8, we have defined the genomic organization of the RNA transcripts that contain the SCA8 CUG repeat tract. These transcripts are alternatively spliced, contain up to five exons, and span a genomic region of over 32 kb. The SCA8 CUG repeat is in the 3′ terminal exon of these transcripts, although we also identified transcripts that have an alternative 3′ terminal exon and so do not contain the SCA8 repeat. We have assembled over 100 kb of genomic sequence in the 5′ region of this gene, but we have not yet determined a precise transcription start site. None of the SCA8 splice variants have a significant ORF, and so we do not believe that these transcripts function as mRNAs. Rather, the SCA8 transcripts are organized as natural antisense transcripts of the mRNA that encodes the KLHL1 protein.

We have identified the full-length cDNA sequence for both the human and mouse KLHL1 genes, and have elucidated the general genomic structure of the human gene. By comparing the human and mouse cDNAs and DNA sequences from the genomic regions containing the first KLHL1 exon, we were able to determine that the predicted core promoter and ORF of the KLHL1 gene have been functionally conserved between these two species. The genomic sequence at the first exon–intron splice junction is also conserved, as is the major splice donor sequence in the antisense (i.e. SCA8) orientation. By identifying these features of the KLHL1 gene, we have determined that the SCA8 antisense RNA is transcribed through the transcription start site, the translation start site, and the first splice junction of the KLHL1 gene. Because of the extent and nature of the overlap between these two transcripts, the SCA8 antisense transcript could potentially regulate the expression of the KLHL1 gene by altering its transcription, translation, splicing or transcript stability.

Regulation of gene expression by antisense RNA is well established in prokaryotic systems (21), and there are a growing number of examples of natural antisense RNAs in eukaryotic organisms (22). In eukaryotes, antisense transcripts have been shown either to be expressed in competition with sense transcripts (23), or to pair with the homologous sense transcript and affect post-transcriptional events such as splicing (24), RNA transport (25), cytoplasmic stability (26–28) and possibly translation (29). Although we have not yet demonstrated that any of these aspects of KLHL1 expression are affected by the SCA8 transcript, the genomic arrangement of the untranslated SCA8 transcript in relation to KLHL1 is strongly indicative of its role as a potential regulator of this gene. In particular, the large size of the first KLHL1 intron may make the KLHL1 RNA particularly sensitive to splicing inhibition by the SCA8 antisense RNA (24). Understanding the nature of the possible interactions between these two transcripts as well as the biological role of KLHL1 may therefore be critical for understanding both the normal function of the SCA8 transcript and the molecular pathology of spinocerebellar ataxia type 8.

The human KLHL1 gene is expressed primarily in various brain tissues, including the cerebellum, the tissue most affected by SCA8 (6). Our expression data to this point with the mouse KLHL1 gene indicate that it is also specifically expressed in the cerebellum and other brain tissues, and that there is a relatively higher level of expression in fetal tissue than in the adult brain. Consistent with this is the fact that all of the human KLHL1 cDNA clones in the EST database were from either fetal or infant brain libraries. These observations may indicate that the KLHL1 protein may play a more active role in the developing brain than in adult brain tissue.

The predicted domain structure of the KLHL1 protein is characteristic of a number of proteins that bind actin, can form dimers and that are in general thought to serve as actin-organizing proteins (7,18–20). Based on this homology we expect that KLHL1 will have properties that are similar to these proteins, although we have not yet experimentally demonstrated that KLHL1 either dimerizes or binds actin. We are currently performing experiments that address these issues (in progress). We have shown that the KLHL1 protein is localized to the cytoplasm, and so we speculate that it may play a role in organizing the actin cytoskeleton of the brain cells in which it is expressed. The cellular localization, domain structure and general expression pattern of KLHL1 is very similar to that of the recently described KLHL2 (Mayven) protein (20), and these proteins may perform similar cellular functions.

We do not yet know why the CTG expansion that causes SCA8 leads to cerebellar degeneration in affected individuals. We have assembled over 165 kb of genomic sequence in the SCA8 region, and SCA8 and KLHL1 are the only transcripts that we have been able to identify within this region. Since both of these genes are expressed in the cerebellum, the pathogenic effect of the expansion may be mediated either directly or indirectly through one or both of these transcripts. The promoters for these transcripts are both located over 31 kb from the expanded repeat, so we do not think that the expansion would directly alter the transcription of these genes. However, pathogenic expansions in the SCA8 antisense RNA may alter its stability or processing, which could in turn affect the expression of the KLHL1 gene in a dominant manner. If the CUG expansion leads to an accumulation of the SCA8 transcript, this could negatively affect expression from both of the KLHL1 alleles through an antisense interaction with the KLHL1 transcripts. Alternatively, the CUG expansion may prevent the SCA8 transcript from negatively regulating KLHL1 expression, and the resulting over-expression of the KLHL1 protein could be toxic to cerebellar tissue and result in ataxia.

MATERIALS AND METHODS

PCR and RACE reactions

Genomic intron junction sequences were obtained by performing single-primer PCRs with primers designed from the SCA8 cDNA sequence and the previously described RBgl24 primer (30). KLHL1 cDNA for use as a PCR template was generated using SuperScriptII (Gibco BRL, Rockville, MD) mRNA from substantia nigra brain tissue (Clontech, Palo Alto, CA) and a KLHL1-specific primer (KLHL23R, GGA CAT TGT GTA ATG TTT CCA CT). The coding portion of the KLHL1 cDNA was generated using a PCR with a primer near the 3′ end of the cDNA (EcoRVend CCT GAT ATC TGG GCG ATG AGA ATA TGA AGT CTG) and a primer designed from exon D of SCA8 (KpnFull, TGC GGT ACC CAT GTC AGG CTC TGG GCG AAA AG) and the high-fidelity polymerase Pfu Turbo (Stratagene, La Jolla, CA) using the reaction buffer supplied and recommended conditions. The 5′ UTR of the KLHL1 cDNA was amplified using a primer in the gene’s second exon (F23R, TTG AAT GGC CGG GTT GAT GAC AG) and a primer that anneals 54 nucleotides 5′ of the 5′ KLHL1 EST sequence D61571 (D125, TGG GGC TCT TTC TCT CTG CGC TCT C).

Mouse KLHL1 sequence was obtained by performing a PCR using mouse genomic DNA and a human KLHL1 primer pair (E22R, CTG CTG AGT GCC CTG CCC AGG AG and D23, ACC CAG CCA GAG TCG CCT GCT CA) specific for the first exon of the gene, and a similar reaction using mouse brain cDNA (Clontech) and another human primer pair (F27, ATG CTG AGC AAA CCT TCA GAA AGA TGG and H28R, TGT GTA GCT GTG GGC CAC CTT CAT TAA C). Primers specific for mouse KLHL1 sequence were designed from this sequence, and PCRs were performed using a mouse primer (musE22R, TTG CTG CAG CCT CGT GGC AAC T) and a human primer (C123, CTT GAC AGC TTC ACA GGC GGG CT) to obtain promoter and 5′ UTR sequence. PCR with a mouse primer pair (mus2-22, AGT TGC CAC GAG GCT GCA GCA A and mus3–24R, TCC TTG AGC ATC AGC AAA TGC CCT) joined the two sequences previously generated. Marathon-Ready cDNA reactions using the mouse brain cDNA (Clontech) and mouse primers (mus2-22 and mus3-25, GCA TAG ACC CAA ATG CAC TCT GGG A) were performed as described (1) to obtain the 3′ end of the cDNA. Additional sequences flanking the first exon were obtained using mouse primers and RBgl24 primer in single-primer PCRs.

Northern blot analysis and RT–PCR

Human and mouse multiple tissue northerns (Clontech) were used for northern analysis. A 654 bp cDNA probe from the 3′ untranslated region of the human KLHL1 transcript was generated by PCR with primers RCL-R1 (ATG CAG CTT TGA TTA GTA GGA CAG T) and MH4-23 (TGG GAG AGC AGG TGC CTG TGT GG) and a 605 bp from the mouse KLHL1 coding region was generated using primers mus3-26 (GCA ACA GCA GCT CTG TGA TGT CAT CC) and MHKE3-1R (GTG GTG GAA GCA GTG GCA). The probes were random prime labeled (Gibco BRL) and hybridized to the respective blot using express hyb hybridization solution (Clontech). Manufacturers’ recommendations were used for hybridization and washes. The human northern blot had been probed previously with an unrelated probe and was stripped prior to this rehybridization as recommended by the manufacturer. A mouse primer pair that crossed an exon–exon junction (mus3‐25 and mus3-24R) was also used as an assay for the presence of KLHL1 mRNA in various mouse tissues using a panel of normalized first-strand cDNAs [mouse multiple tissue cDNA panel (Clontech)].

Total RNA was isolated from a mouse cerebellum using TRIzol Reagent (Gibco BRL) and treated with RQ1 RNase-free DNase (Promega, Madison, WI). First-strand cDNA was generated from this RNA using the KLHL1-specific primer MHKE3-1R and the superscript first-strand synthesis system for RT–PCR (BRL). PCR was performed with the reverse primer mus3-24R paired with either mus7-23 (TAC TGT GAG GAC CTT GAC AGC TT), mus5-27 (CAA ACT GAC TAT ATA AAA CCG CCC CTT), mus7-21 (AAG CAT TTG ACT GTC CTT CGG) or mus2-22 (94°C 45 s, 55°C 45 s, 72°C 1.5 min, 35 cycles). Control reactions without added RT were performed in parallel with these reactions, as were positive PCR control reactions for each of the different forward primers.

Sequence analysis

Blast homology searches (11) were performed using web-based programs available at www.ncbi.nlm.nih.gov , and the programs promoter prediction by neural network (13) (www.fruitfly.org/seq_tools/promoter.html ) and TSSG/W (14) (dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html ) were used to identify the KLHL1 core promoter. DNA and protein sequence alignments were performed using the seqweb program GAP (GCG). The working draft sequences from BACs 7O24, 121J6 and 20M9 were assembled into a complete contiguous genomic sequence by using the Blast program to compare these sequences with each other and with the KLHL1 cDNA and the genomic sequences we had compiled. We closed the remaining gaps in the sequence by making PCR primers that generated products across the gaps and sequencing these products.

Subcellular localization of KLHL1

The coding region KLHL1 cDNA PCR product described above was cloned into KpnI/EcoRV-digested plasmid pcDNA6/HISA (Invitrogen) to generate a construct in which the Xpress epitope tag is fused to full-length KLHL1 cDNA. This construct was transfected into COS-1 cells grown on glass cover slips using a calcium phosphate method (31). Briefly, the cells were plated at 3 × 104 cell/cm2 on glass cover slips in a six well plate and were allowed to attach to the cover slip overnight. A DNA/CaCl2 precipitate was formed by adding 50 µl of 2.5 M CaCl2 to 20 µg DNA in 450 µl sterile water and then adding 500 µl of 2 × HBS (280 mM NaCl, 50 mM HEPES acid, 1.5 mM Na2HPO4). 120 µl of the precipitate was distributed evenly over a well and gently agitated to mix precipitate and medium. The cells were incubated 16 h under standard growth conditions, the medium was removed and the cells were washed twice with 2 ml sterile phosphate buffered saline (PBS) and then complete medium was added. The cells were cultured for 2 days after transfection (DMEM, 10% fetal bovine serum) and the cover slips were removed from the wells. The cells were fixed in 3.7% formaldehyde for 10 min, washed in PBS for 5 min then stained for 1 h at 37°C with primary anti-Xpress mouse monoclonal antibody (1:200, Invitrogen). The cover slips were then washed with PBS and stained for 1 h at 37° C with the secondary anti-mouse antibody conjugated to CY5 [1:200 (Jackson ImmunoResearch Laboratories, West Grove, PA)]. After a final wash in PBS, the cover slips were mounted on slides (GG-1, Sigma, St Louis, MO) and were examined using a Bio-Rad (Hercules, CA) MRC-100 confocal microscope equipped with a krypton–argon laser.

GenBank accession numbers

Contiguous genomic DNA sequence in the SCA8/KLHL1 overlap region (166 358 bp), AF252279; human KLHL1 cDNA, AF252283; human KLHL1 segmented genomic DNA, AF252271; mouse KLHL1 cDNA, AF252281; mouse KLHL1i promoter/first exon genomic sequence, AF252282; complete insert sequence of human KLHL1 antisense cDNA clone (IMAGE clone 2067260, EST AI803351), AF252280; working draft sequences for RPCI-11 BACs AC009221 (45C1), AC013772 (7O24), AC013803 (20M9), AL160391 (121J6) and AL162378 (394C23); human KLHL1 EST sequence, D61571, D81773, C14943 and D60372 (5′ and 3′ ends of GEN-102H05, respectively), F05568 and Z38538 (5′ and 3′ ends of c-0fa11, respectively); mouse KLHL1 ESTs, AV331743, AV329217 and AV382286.

ACKNOWLEDGEMENTS

We thank members of Harry Orr’s laboratory: Lisa Duvick for providing COS-1 cells and assistance with transfection and culturing protocols, and Cynthia Vierra and Tao Zu for assistance with the immunofluorescence procedures. This work was funded by grant NS36282 from the NINDS/NIH.

+

To whom correspondence should be addressed. Tel: +1 612 626 4521; Fax: +1 612 626 2600;Email: koobx001@gold.tc.umn.edu

Figure 1. Intron–exon organization of SCA8 transcripts. The top line represents genomic DNA. Functional splice donor sequences are indicated with a circled GT, functional splice acceptor sites are indicated with a boxed AG, and sizes of exons and introns are shown above the line. Sites D, D′ and D′′ are alternative splice donor sequences of the most 5′ exon. The locations of polyadenylation (polyA) sites and of the unstable SCA8 CTG repeat are also indicated. Dashed lines represent observed splicing events that generated SCA8 RNA, indicated by the second line. The relative positions of the cDNA clone sequences present in the EST databases are indicated by the bottom two lines. EST AI803351 is from a cDNA clone that is a splice variant of the SCA8 RNA that terminates in an extended exon B rather than in the CTG-containing exon A. EST D61571 sequence is from a mRNA transcribed in the opposite orientation from the other RNAs represented in this figure (i.e. D61571 is from a KLHL1 cDNA).

Figure 2. General genomic organization of the SCA8/KLHL1 region of chromosome 13. The contiguous horizontal line represents genomic DNA, and the waved lines represent KLHL1 or SCA8 RNA, as indicated. Exons are represented by rectangles and the minimum intron sizes in the KLHL1 gene are shown above the line (when known). The position of the SCA8 CTG repeat (CUG), the polymorphic markers D13S1296 and D13S318 (D13S IN FIGURE, SHOULD WE CHANGE), and the STS WI-6558 are indicated with vertical lines. The relative position of BAC clones that span this region are shown below the corresponding genomic DNA section, with heavy lines representing BAC clones for which working draft sequence is available. The exons or markers present in each BAC are indicated, and the BAC name and size is shown below. The precise ends of the two BAC clones represented by dashed lines have not been defined. All BAC clones are from the RPCI-11 human male BAC library.

Figure 3. Tissue-specific expression of the KLHL1 gene in human and mouse. Northern blots of poly(A)+ RNA isolated from the eight different (A) human or (B) mouse tissues indicated above each blot were hybridized with probes specific for human or mouse KLHL1 mRNA, respectively. KLHL1 transcripts of the expected size were detected in mRNA isolated from whole brain in both human and mouse, but were not detected in mRNA isolated from any of the other tissues analyzed. (C) RT–PCR with KLHL1-specific primers was performed on total RNA isolated from a mouse cerebellum. First-strand cDNA was generated using a primer specific for the fifth KLHL1 exon, and PCRs were then performed using a reverse primer from the fourth KLHL1 exon paired with a forward primer that would anneal either 5′ of the expected transcription start site (primers/lanes 1 and 2), at the expected start site (primer/lane 3), or within the coding region of the first exon (primer/lane 4), as indicated. PCR products were detected using forward primers that would amplify the expected KLHL1 transcript (lanes 3 and 4) but was not detected using primers 5′ of the predicted transcription start site (lanes 1 and 2). These results indicate that KLHL1 is expressed in mouse cerebellum and confirm that transcription begins at approximately the expected location.

Figure 4. Comparison of the KLHL1 promoter and first exon genomic sequence from human and mouse. An alignment of sequence from the genomic DNA that spans the KLHL1 promoter and first exon is shown, with the human sequence on top and the mouse sequence below. Nucleotide bases that are conserved are indicated by vertical lines, and gaps in the alignment are indicated by dots. The sequence was abbreviated in two places, with the total number of nucleotides, the number of matching nucleotides and the number of gaps in the omitted sequences shown. The KLHL1 ORF is shaded, and the predicted TATAA box and transcription start site for KLHL1 are indicated. The conserved splice donor sites for KLHL1 and SCA8 (antisense) are boxed, and the consensus GT pairs at the intron borders are shaded. The SCA8 splice donor site D is shown for the human sequence, but this sequence is only partially conserved in the mouse.

Figure 5. The predicted domain organization and amino acid sequence comparison of human and mouse KLHL1 proteins. The protein sequence for human KLHL1 is shown on the top line, and the differences in the mouse KLHL1 protein sequence is indicated below, with vertical lines representing identical amino acids. Dots are used as spacers in the human sequence for the three additional amino acids present in the mouse protein. The POZ (also called BTB) dimerization domain and the six kelch repeats (KREPs) characteristic of actin-binding proteins are shaded. These two domains, as well as the intervening sequence (IVS) and amino-terminal region (NTR), are labeled at the side of the amino acid sequence. The individual KREPs are separated by vertical lines and numbered above the sequence.

Figure 6. KLHL1 is primarily a cytoplasmic protein. A plasmid construct in which full-length KLHL1 was fused to an amino-terminal epitope tag was transiently transfected into COS-1 cells. The cells were analyzed by immunofluorescence using an antibody specific for the epitope tag. In the panel on the left, a transfected cell is shown in a field containing untransfected cells (internal negative staining controls), and a larger view of a transfected cell is shown in the panel on the right. The tagged KLHL1 protein was found in the cytoplasm but not in the nucleus (unstained area) of the transfected cells.]

Table 1.

Splice donor, splice acceptor and poly(A) sites for SCA8 and KLH1 transcripts.

Exon Splice donor sequenceIntron sizeSplice acceptor sequenceExon
SCA8
DCGCAGGAGTAGGCTG7399(to C2)
D′′TGGCGAGGTGGGACA7121(to C2)
D′CTCCCCTGTAAGTGA6653TCTCTCATAGTTCTGGAGGCC2
C2TACTCCTGTAAGTCC14 487ATTGCAATAGCTATGGCAACC1
C1GAACAAGGTAAAAAC794TCCATTTCAGATTCAAACTTB
BGTTGAAGGTATAGAG8 396TTGCATTCAGATTGCCTTTTA
B′ poly(A) site: TATTCCTGTAATTAAATATTACTTTCCCCTCAA
A poly(A) site: AGAATTTATGAATAAA
KLHL1
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
2CACATAGGTACAGTA 14 174ACTCTTTCAGGCTTGTTCTG3
3TATACAGGTATGGCA>7 700TTATGTTTAGGCTGCTTGGA4
4CACAATGGTAAGGAA>57 000CTCCCTGCAGGAAAACATAA5
5ACCACAGGTAATGAT 43 132TTATTCACAGATATTGGCTG6
6AACAAAGGTATTTAA 42 012GTTTTATCAGGAGCTACAAC7
7GGTCTAGGTAAGATC 56 183AATATTTTAGGTGTAACAGT8
8ATGGCAAGTAAGTAA 20 813GTGTTTTTAGGTTGTATTCA9
9TAGAAAGGTAAGACC>11 573TGAATTTTAGATATGATCCC10
10GACACAGGTAAGATT>5 700TTTTTAACAGATGGCTTCCT11
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
Poly(A) site: AATTACAATTAATAAATGATCAAAAAATTTGCA
Exon Splice donor sequenceIntron sizeSplice acceptor sequenceExon
SCA8
DCGCAGGAGTAGGCTG7399(to C2)
D′′TGGCGAGGTGGGACA7121(to C2)
D′CTCCCCTGTAAGTGA6653TCTCTCATAGTTCTGGAGGCC2
C2TACTCCTGTAAGTCC14 487ATTGCAATAGCTATGGCAACC1
C1GAACAAGGTAAAAAC794TCCATTTCAGATTCAAACTTB
BGTTGAAGGTATAGAG8 396TTGCATTCAGATTGCCTTTTA
B′ poly(A) site: TATTCCTGTAATTAAATATTACTTTCCCCTCAA
A poly(A) site: AGAATTTATGAATAAA
KLHL1
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
2CACATAGGTACAGTA 14 174ACTCTTTCAGGCTTGTTCTG3
3TATACAGGTATGGCA>7 700TTATGTTTAGGCTGCTTGGA4
4CACAATGGTAAGGAA>57 000CTCCCTGCAGGAAAACATAA5
5ACCACAGGTAATGAT 43 132TTATTCACAGATATTGGCTG6
6AACAAAGGTATTTAA 42 012GTTTTATCAGGAGCTACAAC7
7GGTCTAGGTAAGATC 56 183AATATTTTAGGTGTAACAGT8
8ATGGCAAGTAAGTAA 20 813GTGTTTTTAGGTTGTATTCA9
9TAGAAAGGTAAGACC>11 573TGAATTTTAGATATGATCCC10
10GACACAGGTAAGATT>5 700TTTTTAACAGATGGCTTCCT11
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
Poly(A) site: AATTACAATTAATAAATGATCAAAAAATTTGCA
Table 1.

Splice donor, splice acceptor and poly(A) sites for SCA8 and KLH1 transcripts.

Exon Splice donor sequenceIntron sizeSplice acceptor sequenceExon
SCA8
DCGCAGGAGTAGGCTG7399(to C2)
D′′TGGCGAGGTGGGACA7121(to C2)
D′CTCCCCTGTAAGTGA6653TCTCTCATAGTTCTGGAGGCC2
C2TACTCCTGTAAGTCC14 487ATTGCAATAGCTATGGCAACC1
C1GAACAAGGTAAAAAC794TCCATTTCAGATTCAAACTTB
BGTTGAAGGTATAGAG8 396TTGCATTCAGATTGCCTTTTA
B′ poly(A) site: TATTCCTGTAATTAAATATTACTTTCCCCTCAA
A poly(A) site: AGAATTTATGAATAAA
KLHL1
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
2CACATAGGTACAGTA 14 174ACTCTTTCAGGCTTGTTCTG3
3TATACAGGTATGGCA>7 700TTATGTTTAGGCTGCTTGGA4
4CACAATGGTAAGGAA>57 000CTCCCTGCAGGAAAACATAA5
5ACCACAGGTAATGAT 43 132TTATTCACAGATATTGGCTG6
6AACAAAGGTATTTAA 42 012GTTTTATCAGGAGCTACAAC7
7GGTCTAGGTAAGATC 56 183AATATTTTAGGTGTAACAGT8
8ATGGCAAGTAAGTAA 20 813GTGTTTTTAGGTTGTATTCA9
9TAGAAAGGTAAGACC>11 573TGAATTTTAGATATGATCCC10
10GACACAGGTAAGATT>5 700TTTTTAACAGATGGCTTCCT11
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
Poly(A) site: AATTACAATTAATAAATGATCAAAAAATTTGCA
Exon Splice donor sequenceIntron sizeSplice acceptor sequenceExon
SCA8
DCGCAGGAGTAGGCTG7399(to C2)
D′′TGGCGAGGTGGGACA7121(to C2)
D′CTCCCCTGTAAGTGA6653TCTCTCATAGTTCTGGAGGCC2
C2TACTCCTGTAAGTCC14 487ATTGCAATAGCTATGGCAACC1
C1GAACAAGGTAAAAAC794TCCATTTCAGATTCAAACTTB
BGTTGAAGGTATAGAG8 396TTGCATTCAGATTGCCTTTTA
B′ poly(A) site: TATTCCTGTAATTAAATATTACTTTCCCCTCAA
A poly(A) site: AGAATTTATGAATAAA
KLHL1
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
2CACATAGGTACAGTA 14 174ACTCTTTCAGGCTTGTTCTG3
3TATACAGGTATGGCA>7 700TTATGTTTAGGCTGCTTGGA4
4CACAATGGTAAGGAA>57 000CTCCCTGCAGGAAAACATAA5
5ACCACAGGTAATGAT 43 132TTATTCACAGATATTGGCTG6
6AACAAAGGTATTTAA 42 012GTTTTATCAGGAGCTACAAC7
7GGTCTAGGTAAGATC 56 183AATATTTTAGGTGTAACAGT8
8ATGGCAAGTAAGTAA 20 813GTGTTTTTAGGTTGTATTCA9
9TAGAAAGGTAAGACC>11 573TGAATTTTAGATATGATCCC10
10GACACAGGTAAGATT>5 700TTTTTAACAGATGGCTTCCT11
1GACACAGGTACAGTA~140 000TGGTTTATAGGCTGTCATCA2
Poly(A) site: AATTACAATTAATAAATGATCAAAAAATTTGCA

References

1 Koob, M.D., Moseley, M.L., Schut, L.J., Benzow, K.A., Bird, T.D., Day, J.W. and Ranum, L.P. (

1999
) An untranslated CTG expansion causes a novel form of spinocerebellar ataxia (SCA8).
Nature Genet.
,
21
,
379
–384.

2 Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C., Narang, M., Barcelo, J., O’Hoy, K. et al. (

1992
) Myotonic dystrophy mutation: an unstable CTG repeat in the 3′ untranslated region of the gene.
Science
,
255
,
1253
–1255.

3 Brice, A. (

1998
) Unstable mutations and neurodegenerative disorders.
J. Neurol.
,
245
,
505
–510.

4 Klockgether, T. and Evert, B. (

1998
) Genes involved in hereditary ataxias.
Trends Neurosci.
,
21
,
413
–418.

5 Groenen, P. and Wieringa, B. (

1998
) Expanding complexity in myotonic dystrophy.
Bioessays
,
20
,
901
–912.

6 Ikeda, Y., Shizuka, M., Watanabe, M., Okamoto, K. and Shoji, M. (

2000
) Molecular and clinical analyses of spinocerebellar ataxia type 8 in Japan.
Neurology
,
54
,
950
–955.

7 Robinson, D.N. and Cooley, L. (

1997
) Drosophila kelch is an oligomeric ring canal actin organizer.
J. Cell Biol.
,
138
,
799
–810.

8 Frohman, M.A., Dush, M.K. and Martin, G.R. (

1988
) Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer.
Proc. Natl Acad. Sci. USA
,
85
,
8998
–9002.

9 Roux, K.H. and Dhanarajan, P. (

1990
) A strategy for single site PCR amplification of dsDNA: priming digested cloned or genomic DNA from an anchor-modified restriction site and a short internal sequence.
Biotechniques
,
8
,
48
–57.

10 Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D., Madden, T.L., Schuler, G.D., Tatusova, T.A. and Rapp, B.A. (

2000
) Database resources of the national center for biotechnology information.
Nucleic Acids Res.
,
28
,
10
–14.

11 Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (

1997
) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res.
,
25
,
3389
–3402.

12 Miller, G., Fuchs, R. and Lai, E. (

1997
) IMAGE cDNA clones, UniGene clustering, and ACeDB: an integrated resource for expressed sequence information.
Genome Res.
,
7
,
1027
–1032.

13 Reese, M.G., Harris, N.L. and Eeckman, F.H. (

1996
) Large scale sequencing specific neural networks for promoter and splice site recognition. In Hunter L. and Klein, T.E. (eds), Biocomputing: Proceedings of the 1996 Pacific Symposium. World Scientific Publishing Co, Singapore.

14 Smith, R.F., Wiese, B.A., Wojzynski, M.K., Davison, D.B. and Worley, K.C. (

1996
) BCM Search Launcher—an integrated interface to molecular biology data base search and analysis services available on the World Wide Web.
Genome Res.
,
6
,
454
–462.

15 Bardwell, V.J. and Treisman, R. (

1994
) The POZ domain: a conserved protein-protein interaction motif.
Genes Dev.
,
8
,
1664
–1677.

16 Zollman, S., Godt, D., Prive, G.G., Couderc, J.L. and Laski, F.A. (

1994
) The BTB domain, found primarily in zinc finger proteins, defines an evolutionarily conserved family that includes several developmentally regulated genes in Drosophila.
Proc. Natl Acad. Sci. USA
,
91
,
10717
–10721.

17 Bork, P. and Doolittle, R.F. (

1994
) Drosophila kelch motif is derived from a common enzyme fold.
J. Mol. Biol.
,
236
,
1277
–1282.

18 Kim, T.A., Lim, J., Ota, S., Raja, S., Rogers, R., Rivnay, B., Avraham, H. and Avraham, S. (

1998
) NRP/B, a novel nuclear matrix protein, associates with p110(RB) and is involved in neuronal differentiation.
J. Cell Biol.
,
141
,
553
–566.

19 Hernandez, M.C., Andres-Barquin, P.J., Martinez, S., Bulfone, A., Rubenstein, J.L. and Israel, M.A. (

1997
) ENC-1: a novel mammalian kelch-related gene specifically expressed in the nervous system encodes an actin-binding protein.
J. Neurosci.
,
17
,
3038
–3051.

20 Soltysik-Espanola, M., Rogers, R.A., Jiang, S., Kim, T.A., Gaedigk, R., White ,R.A., Avraham, H. and Avraham, S. (

1999
) Characterization of Mayven, a novel actin-binding protein predominantly expressed in brain.
Mol. Biol
.
Cell
,
10
,
2361
–2375.

21 Wagner, E.G. and Simons, R.W. (

1994
) Antisense RNA control in bacteria, phages, and plasmids.
Annu. Rev. Microbiol
.,
48
,
713
–742.

22 Vanhee-Brossollet, C. and Vaquero, C. (

1998
) Do natural antisense transcripts make sense in eukaryotes?
Gene
,
211
,
1
–9.

23 Wutz, A., Smrzka, O.W., Schweifer, N., Schellander, K., Wagner, E.F. and Barlow, D.P. (

1997
) Imprinted expression of the Igf2r gene depends on an intronic CpG island.
Nature
,
389
,
745
–749.

24 Munroe, S.H. and Lazar, M.A. (

1991
) Inhibition of c-erbA mRNA splicing by a naturally occurring antisense RNA.
J. Biol. Chem.
,
266
,
22083
–22086.

25 Okano, H., Aruga, J., Nakagawa, T., Shiota, C. and Mikoshiba, K. (

1991
) Myelin basic protein gene and the function of antisense RNA in its repression in myelin-deficient mutant mouse.
J. Neurochem.
,
56
,
560
–567.

26 Hildebrandt, M. and Nellen, W. (

1992
) Differential antisense transcription from the Dictyostelium EB4 gene locus: implications on antisense-mediated regulation of mRNA stability.
Cell
,
69
,
197
–204.

27 Savage, M.P. and Fallon, J.F. (

1995
) FGF-2 mRNA and its antisense message are expressed in a developmentally specific manner in the chick limb bud and mesonephros.
Dev. Dyn.
,
202
,
343
–353.

28 Kimelman, D. and Kirschner, M.W. (

1989
) An antisense mRNA directs the covalent modification of the transcript encoding fibroblast growth factor in Xenopus oocytes.
Cell
,
59
,
687
–696.

29 Wightman, B., Ha, I. and Ruvkun, G. (

1993
) Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans.
Cell
,
75
,
855
–862.

30 Lisitsyn, N., Lisitsy, N. and Wigler, M. (

1993
) Cloning the differences between two complex genomes.
Science
,
259
,
946
–951.

31 Ausubel, F.M., Brent, R., Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. and Struhl, K. (

1987
) Current Protocols in Molecular Biology. Greene Publ. Associates and Wiley-Interscience, New York.