Elsevier

Gene

Volume 298, Issue 1, 18 September 2002, Pages 59-68
Gene

Identification and characterization of the human long form of Sox5 (L-SOX5) gene

https://doi.org/10.1016/S0378-1119(02)00927-7Get rights and content

Abstract

The Sox (Sry-type HMG box) group of transcription factors, which is defined by a high-mobility group (HMG) DNA-binding domain, is categorized into six subfamilies. Sox5 and Sox6 belong to the group D subfamily, which is characterized by conserved N-terminal domains including a leucine-zipper, a coiled-coil domain and a Q-box. Group D Sox genes are expressed as long and short transcripts that exhibit differential expression patterns. In mouse, the long form of Sox5, L-Sox5, is co-expressed and interacts with Sox6; together, these two proteins appear to play a key role in chondrogenesis and myogenesis. In humans, however, only the short form of Sox5 has previously been identified. To gain insight into Sox5 function, we have identified and characterized human L-SOX5. The human L-SOX5 cDNA encodes a 763-amino-acid protein that is 416 residues longer than the short form and contains all of the characteristic motifs of group D Sox proteins. The predicted L-SOX5 protein shares 97% amino acid identity with its mouse counterpart and 59% identity with human SOX6. The L-SOX5 gene contains 18 exons and shows similar genomic structure to SOX6. We have identified two transcription start sites in L-SOX5 and multiple alternatively spliced mRNA variants that are distinct from the short form. Unlike the short form, which shows testis-specific expression, L-SOX5 is expressed in multiple tissues. Like SOX6, L-SOX5 shows strong expression in chondrocytes and striated muscles, indicating a likely role in human cartilage and muscle development.

Introduction

The Sox (Sry-type HMG box) family of transcription factors is related to the testis-determining gene Sry and is defined by the presence of a high-mobility group (HMG) DNA-binding domain. Sox proteins interact with DNA through this domain in a sequence-specific manner, binding in the minor groove of DNA and inducing a significant bend (Ferrari et al., 1992, Connor et al., 1994). Therefore, Sox proteins are thought to function as architectural proteins that organize local chromatin structure, assemble other DNA-binding transcription factors and induce correct gene expression (Werner and Burley, 1997, Wolffe, 1994). Sox function is critical to a number of developmental processes, including sex determination (SOX9) (Foster et al., 1994, Wagner et al., 1994), lens development (Sox1, 2 and 3) (Kamachi et al., 1998), T-cell differentiation (Sox4) (van de Wetering et al., 1993), endocardial ridge development (Sox4) (Schilham et al., 1996), developing cardiac and skeletal muscle systems (Sox6) (Hagiwara et al., 2000) and neural crest cell differentiation (Sox10) (Southard-Smith et al., 1998).

Sox proteins are categorized into six subfamilies based on sequence homology within the HMG box and other domains (Pevny and Lovell-Badge, 1997). The group D subfamily consists of Sox5, Sox6 and Sox13. Although many Sox proteins are encoded by a single exon, group D Sox genes contain multiple exons with conserved genomic structure (Wunderle et al., 1996, Argentaro et al., 2000). They also share conserved N-terminal domains, including a leucine zipper, coiled-coil domains and a glutamine-rich region (Q-box), as well as the highly conserved HMG domain (Kido et al., 1998), which are considered critical to their function.

Group D Sox genes are known to express two types of alternatively spliced transcripts, short and long forms (Hiraoka et al., 1998, Argentaro et al., 2000). The short form is a component of the long form, and it lacks the characteristic N-terminal domains. The two forms exhibit different expression patterns: in mouse, the short forms of Sox5 and Sox6 are predominantly expressed in testis (Connor et al., 1995, Denny et al., 1992). In contrast, the long form of Sox6 is expressed in multiple tissues, especially in skeletal muscle, and the long form of Sox5 (L-Sox5) is primarily expressed in cartilage (Lefebvre et al., 1998). Co-expressed with Sox9 during chondrogenesis, L-Sox5 and Sox6 heterodimerize via their coiled-coil domains and activate the type II collagen gene (Col2A1), which encodes a major matrix component protein in cartilage (Lefebvre et al., 1998).

In humans, only the short form of the SOX5 gene has previously been identified (Wunderle et al., 1996). This short form shares high homology with mouse Sox5. Human SOX6, which is predicted to interact with the long form of SOX5, has recently been identified (Cohen-Barak et al., 2001). These facts strongly support the existence of a long form of human SOX5 (L-SOX5). Identification of the human L-SOX5 gene will be an important first step toward determining the precise role of the group D SOX genes in human chondrogenesis and other developmental pathways.

Here we report the isolation and characterization of the human L-SOX5 gene. L-SOX5 contains all cardinal motifs common to group D SOX proteins. Like its mouse counterpart, L-SOX5 has multiple transcriptional start sites and multiple alternative splicing variants, but it shows a unique expression pattern in human tissues.

Section snippets

5′- and 3′-RACE

To extend the SOX5 sequence in the 5′- and 3′-directions, we performed rapid amplification of cDNA ends (RACE). Because mouse L-Sox5 is expressed in both liver and cultured chondrocytes, and human SOX5 is expressed in testis, we used Marathon-Ready human liver and testis cDNAs (Clontech, Palo Alto, CA) and a custom-made cultured human chondrocyte cDNA template prepared using the Marathon cDNA Amplification Kit (Clontech) as templates, according to the manufacturer's instructions.

The 5′-RACE was

Identification of cDNAs encoding human L-SOX5

Through 5′-RACE analysis, we identified two distinct L-SOX5 transcripts. The longest cDNA sequence obtained by RACE analysis, along with its deduced amino-acid sequence, is shown in Fig. 1 (DDBJ accession no. AB081588). The cDNA sequence contains an open reading-frame that encodes a 763-amino-acid protein, which exceeds the length of the SOX5 short form (GDB 5584271) by up to 416 amino acids. The translation start site for the short form of SOX5 is located at codon 417 in L-SOX5. An in-frame

Discussion

We have identified the human L-SOX5 cDNA. The predicted human L-SOX5 protein, which is more than twice as large as the short form, contains motifs that are characteristic of group D Sox proteins but absent from the short form. These motifs are highly conserved between L-SOX5 and SOX6, and between human and mouse L-Sox5. This intra-familial and inter-species conservation highlights the importance of these motifs in the function of group D Sox genes. In mouse, L-Sox5 and Sox6 proteins form homo-

Acknowledgements

We thank Drs. Masayoshi Namba and Hidetoshi Okabe for help in performing study of cell line cultures, and Mss. Aya Narita and Tomoko Kusadokoro for excellent technical assistance.

References (27)

  • T. Chano et al.

    Characterization of a newly established human chondrosarcoma cell line, CS-OKB

    Virchows Arch.

    (1998)
  • F. Connor et al.

    DNA binding and bending properties of the post-meiotically expressed Sry-related protein Sox-5

    Nucleic Acids Res.

    (1994)
  • F. Connor et al.

    The Sry-related HMG box-containing gene Sox6 is expressed in the adult testis and developing nervous system of the mouse

    Nucleic Acids Res.

    (1995)
  • Cited by (0)

    View full text