Introduction

Limb girdle muscular dystrophies are a heterogeneous group of disorders, with the autosomal recessive type 2 (LGMD2) caused by mutations in at least 15 genes [1, 2]. Characterization of the mutational spectrum of a disease in diverse populations is necessary to understand its pathogenesis and epidemiology. The frequencies of different LGMD2 subtypes and mutations vary considerably by population [1], but to our knowledge, the genetic profile of LGMD2 in Saudi Arabian patients has not been previously described. However, it can be costly and time consuming to screen all known LGMD2 genes (TTN alone has at least 312 exons), and screening only the seven most commonly mutated genes can fail to yield a mutation in approximately 30–40% of patients [3, 4].

Homozygosity mapping is an effective technique for localizing rare recessive mutations, as consanguineous pedigrees provide disproportionately high power for linkage analysis relative to the number of individuals genotyped [5]. In addition, high-density microarrays that assay thousands of single nucleotide polymorphisms (SNPs) allow a more informative and inexpensive genomewide linkage scan than conventional microsatellite panels [6]. We hypothesized it would be cost effective to focus the discovery of pathogenic LGMD2 mutations by performing SNP-based linkage analysis on consanguineous families, including those insufficiently informative to achieve suggestive or significant LOD scores. Here, we report the discovery of five novel mutations in four genes via homozygosity mapping in 11 Saudi Arabian families, accomplished at a fraction of the cost required to sequence all 15 LGMD2 genes.

Materials and methods

Thirteen families of Saudi Arabian descent in which patients appeared to be affected with LGMD2 were recruited via King Khalid University Hospital and Security Forces Hospital, Riyadh, Saudi Arabia. Written informed consent was obtained for all subjects, and de-identified genomic DNA samples were transferred to Children’s Hospital Boston, in accordance with the Institutional Review Boards of the above institutions. The presumed diagnosis of LGMD2 was made on the basis of clinical presentation, serum creatine kinase levels, and muscle biopsy findings (including immunohistochemistry when available), as summarized in Table 1.

Table 1 Clinical presentation of 13 families with autosomal recessive muscular dystrophy or myopathy

Samples from 11 families were genotyped at 10,204 SNPs using the GeneChip Human Mapping 10K 2.0 Array (Affymetrix). To minimize costs, we genotyped primarily affected individuals, with unaffected subjects genotyped as needed (Fig. 1). Genomewide multipoint parametric linkage scans were performed using MERLIN v1.1.2 [7]. The disease allele frequency was set to 0.0001, and we used a full penetrance, zero phenocopy model. Marker map positions and Caucasian allele frequencies were provided by Affymetrix. The error checking and Pedwipe functions of MERLIN were used to remove unlikely genotypes. Based on results from families 1186–1191, families 1223–1229 were screened for mutations in SGCA (also known as adhalin) and FKRP prior to 10K genotyping; two families (1224 and 1227) in which mutations were identified were therefore not genotyped. Amplification of candidate gene exons and splice junctions by polymerase chain reaction (PCR) and sequencing of purified products were performed by standard protocols, and sequence data were analyzed using Sequencher v4.8 (Gene Codes) and SeqScape v2.5 (Applied Biosystems). Mutations were genotyped in DNA samples from unrelated control subjects using Custom TaqMan SNP Genotyping Assays (Applied Biosystems). Multi-species sequence alignments were performed in ClustalW [8], and the impact of amino acid substitutions on protein function was predicted by PolyPhen [9].

Fig. 1
figure 1

Pedigrees of 13 families with autosomal recessive muscular dystrophy or myopathy. Squares males, circles females, filled symbols affected individuals, partially black-filled symbols obligate heterozygous carriers, partially gray-filled symbols possible heterozygous carriers, open symbols unaffected individuals. Double bars represent consanguineous unions. Asterisks denote individuals genotyped for genomewide linkage analysis

Results

The clinical presentations and laboratory results were generally typical of LGMD2, except in the case of family 1223, in which creatine kinase levels were not elevated and calf pseudohypertrophy was absent (Table 1). The kindreds 1186–1191 and 1223–1229 were consanguineous and displayed autosomal recessive inheritance (Fig. 1). Mutations were identified in all 13 families, and for the 11 genotyped families, the mutations were in genes implicated by homozygosity mapping (Table 2). Five families produced one or more linkage peaks that together contained a single known LGMD2 gene, including one family (1225) with a maximum LOD score of 1.8 in which we genotyped only two affected siblings. Four families generated linkage peaks containing two known LGMD2 genes, including a family (1228) with a maximum LOD score of 1.2 in which only a single affected individual was genotyped. For three families in which two linked genes resided in different intervals, the LOD score was appreciably higher for the peak containing the gene in which a mutation was found. In family 1191, two linked genes in a single interval, SGCA and TCAP, were both sequenced and TCAP was negative for mutations.

Table 2 Results of linkage analysis and mutation screening

Family 1226 showed linkage at four known LGMD2 genes, but families 1226, 1228, and 1229 shared a haplotype in common across a portion of their linkage peaks containing SGCB (Online Resource 1), allowing us to correctly predict that all three families carried the same ancestral mutation in SGCB and thus avoid sequencing the other five total candidate genes. Two genotyped families with the c.941C>T (p.T314M) mutation in FKRP also shared an identical haplotype across FKRP (data not shown), again implying that these families inherited their mutation from a common ancestor. A third, ungenotyped family carried this novel mutation as well, suggesting that c.941C>T may be a founder mutation in Saudi Arabia.

In the 11th genotyped family, 1223, preliminary linkage results from a single affected individual produced intervals containing CAPN3 and DYSF. The proband was homozygous for a mutation in DYSF, c.6062G>A (p.R2021Q), previously reported (as p.R2000Q) to be compound heterozygous in conjunction with a nonsense mutation in a family with Miyoshi myopathy [10]. However, c.6062G>A was homozygous in a confirmed unaffected sibling in family 1223 and was heterozygous in three of 368 unrelated Kuwaiti control subjects, indicating either that this variant is a benign polymorphism or is only pathogenic when combined with a more severe mutation. Genomewide genotyping of additional family 1223 members excluded both CAPN3 and DYSF by linkage, but a linked interval on chromosome 1 contained the gene SEPN1, in which mutations cause several muscle diseases, including congenital muscular dystrophy (CMD) and multiminicore disease (MmD) [11, 12]. Though affected members of family 1223 showed the typical pattern of weakness for LGMD, careful review of their clinical data also revealed scoliosis, respiratory difficulty treated by tracheostomy, and focal areas of myofibrillar disorganization and Z-band streaming in biopsy tissue, consistent with a diagnosis of CMD or MmD. We identified two novel homozygous missense variants in SEPN1 that segregated with the disease: c.467T>C (p.L156P) and c.1654G>A (p.E552K). However, the c.1654G>A variant was heterozygous in eight of 374 unrelated Kuwaiti controls, the p.E552 residue was not completely conserved (Online Resource 2), and the p.E552K substitution was predicted to be less detrimental by PolyPhen than p.L156P (data not shown). Together, these data strongly suggest that c.1654G>A is an uncommon polymorphism and c.467T>C (p.L156P) is the pathogenic mutation in family 1223.

Finally, mutations in two commonly mutated genes (SGCA and FKRP) were found in two additional families that were not subjected to linkage analysis. In total, we identified eight pathogenic mutations in six genes in 13 families, of which five mutations in four genes were novel, while three mutations were previously reported [1315]. The three novel missense mutations were absent from at least 368 Kuwaiti control DNA samples, and the novel splice site mutation was absent from 386 Kuwaiti controls and 105 Southeastern European controls. All four novel pathogenic missense or splice mutations were completely conserved in all vertebrate species analyzed (Online Resource 2).

Discussion

We conducted, to our knowledge, the first genetic characterization of LGMD in a Saudi Arabian population. Seven cases of Miyoshi myopathy from Saudi Arabia have been described clinically, and though mutations were not identified in these patients [16], mutations in DYSF have since been shown to cause both Miyoshi myopathy and LGMD2B [17]. In our study, homozygosity mapping via linkage analysis was used to efficiently identify five novel and three known mutations in five commonly mutated LGMD2 genes (SGCA, SGCB, SGCD, SGCG, and FKRP) and a congenital myopathy gene (SEPN1). Multiple alternatives to linkage analysis are frequently used for homozygosity mapping [2, 18, 19], usually because they are technically simpler or computationally faster than a linkage scan, especially with very large pedigrees or exceptionally dense marker panels [20]. However, for a trait for which many genes have been previously identified, the probability that any particular family will contain a mutation in a known gene is high, so in this scenario, it is likely more cost effective for pedigrees with known consanguinity to limit genotyping to only a select few informative individuals on the most inexpensive SNP-based platforms. Our study confirms that under these conditions, homozygosity mapping can be readily accomplished by standard parametric linkage analysis [21], which has the benefits over other methods of distinguishing blocks of autozygosity from blocks of uninformative markers (which are not uncommon on high density SNP arrays) when parents and/or unaffected siblings are genotyped, as well as providing a measure of the statistical significance of linkage peaks interpretable by conventional standards.

We further highlight the utility of linkage analysis as a screening method even in families of marginal informativity. Nine of 11 genotyped families produced maximum LOD scores below the threshold of 3.3 for significant genomewide linkage, and five of those families fell below the threshold of 1.9 for suggestive linkage [22]. Nevertheless, linkage analysis excluded at least 13 of the 15 known genes in nine of ten families with LGMD2, and the LOD scores favored a specific gene in four of five families in which multiple known genes were linked. Comparison of haplotype data in linkage peaks shared among families further improved the power of our analyses by identifying families with evidence of carrying the same mutation inherited from a common ancestor. Accordingly, a mutation was identified in nine of ten genotyped LGMD2 families by sequencing a single gene; we thereby saved considerably on time and sequencing costs by conducting linkage scans.

Surprisingly, none of the families showed linkage to CAPN3 or DYSF, though they are the two most commonly mutated genes among LGMD2 patients in some populations [3, 4, 23], suggesting that LGMD2 subtypes A and B may be less common in Saudi Arabia than elsewhere. Since both CAPN3 and DYSF require at least 30 amplicons to sequence their exons, the cost of PCR and bidirectional Sanger sequencing of either one in a single patient exceeds the cost of processing one 10K SNP array ($280 at Children’s Hospital Boston), further illustrating the economy of our approach. Moreover, the exclusion of all 15 known LGMD2 genes in family 1223 and the subsequent discovery of mutations in the myopathy gene SEPN1 demonstrate the application of linkage analysis to assist in resolving ambiguous clinical diagnoses, which may help account for the high failure rate when searching for mutations by sequencing alone.

Another alternative to comprehensive sequencing is to attempt molecular diagnosis by protein analysis, as six of the seven most commonly mutated LGMD2 genes (CAPN3, DYSF, SGCA, SGCB, SGCD, and SGCG) encode proteins that are absent or severely reduced by immunoblot or immunohistochemistry in the corresponding LGMD2 subtypes [4]. While this strategy depends on the availability of well-preserved biopsy tissue, it may be inexpensive and effective in some cases and thus advantageous for non-consanguineous families or sporadic patients. However, protein-based diagnostic testing can be time consuming and laborious, require separate optimization of conditions for each test, or lead to ambiguous findings, as when a single missing sarcoglycan subunit causes disruption of other subunits. Like Sanger sequencing, protein assays involve individual testing of each hypothesis, and the frequency with which a particular test will succeed in identifying the molecular defect depends on the LGMD2 subtype distribution in the population. For infrequently studied populations, the distribution may be unknown or unusual, and for patients in which protein tests are unsuccessful, genetic analyses may then be necessary, possibly negating any initial cost savings.

By contrast, genomewide homozygosity mapping offers a single, unbiased, simultaneous test of all possible loci, and in cases where the most obvious candidate genes are excluded, linkage data can direct attention to novel genes or genes associated with similar phenotypes. While homozygosity mapping is most effective in families known to be consanguineous, it can also be applied successfully to outbred populations when higher density SNP arrays are used [24]. Homozygosity mapping may therefore provide a faster and less expensive approach to molecular diagnosis of genetically heterogeneous traits in many families than comprehensive screening of candidate genes by traditional sequencing methods.