Introduction

Autism spectrum disorders (ASD; MIM 209850) are characterized by a triad of symptoms, involving abnormalities in reciprocal social interaction and communication as well as stereotyped patterns of interests and activities.1, 2 They include a broad group of disorders including autism, Asperger syndrome (AS), disintegrative disorder, and atypical forms of autism. The population prevalence for autism is 4–10/10 000, whereas the total prevalence for ASDs is 10–60/10 000.3, 4, 5 Twin and family studies have supported a strong genetic component, but the predisposing genetic mechanisms are still largely unknown.6, 7 The predominant hypothesis is that a combination of multiple predisposing genetic and environmental factors are involved in the etiology.8, 9 However, autism is sometimes associated with syndromes caused by rare high-penetrance mutations or cytogenetic abnormalities.10, 11, 12, 13

Recently, two X-chromosomal neuroligin genes, NLGN3 and NLGN4, were shown to be mutated in patients with ASDs. Two affected males with autism and AS in a Swedish family had an insertion in NLGN4, which led to premature truncation of the protein (D396X). In another Swedish family with autism and AS, an R451C substitution in NLGN3 was identified.14 In addition, a truncating NLGN4 mutation (D429X) was later reported in a large family, where mental retardation with or without ASDs segregated X-chromosomally.15 These findings show further support for the hypothesis that some autism cases might be caused by rare mutations with a major effect. Another important conclusion is that similar truncating NLGN4 mutations seem to result in a wide variety of phenotypes ranging from autism to the milder AS and to mental retardation without autistic features.

Neuroligins are essential components of synaptogenesis. They are localized postsynaptically in the glutamatergic synapses and interact with presynaptic β-neurexin as well as with postsynaptic density proteins PSD-95 and S-SCAM.16, 17, 18, 19 Neuroligin-β-neurexin interaction triggers the formation of functional presynaptic structures involving synaptic vesicles and exocytotic apparatus.20, 21, 22 Several lines of evidence indicate that the reported NLGN3 and NLGN4 mutations have deleterious effects at the molecular level. First, both of the truncating mutations reported by Jamain et al14 and Laumonnier et al15 involve deletion of the AchE-homologous domain, which is required for oligomerization and synapse promoting activity of neuroligins.21 Second, Comoletti et al23 reported markedly diminished β-neurexin binding activity of the R451C mutation in NLGN3 compared to the wild-type protein. Third, both R451C mutation in NLGN3 and D396X mutation in NLGN4 lead to retention of the protein in the endoplasmic reticulum, and cell surface levels of the mutants are significantly lower than the levels of the wild-type protein. Finally, neither of these mutants seems to promote presynaptic differentiation.22, 23

We considered neuroligins as primary candidates in the Finnish autism sample since both 3q25–27 and Xq13 loci harboring NLGN1 and NLGN3 genes were among the strongest linkage regions in our previous genome-wide scan.24 Here, we present a detailed molecular genetic analysis of four neuroligin genes, NLGN1, NLGN3, NLGN4, and NLGN4Y, in Finnish families with ASD.

Subjects and methods

Subjects

The families were selected after detailed clinical and medical examinations as described elsewhere.24 Diagnoses were assessed according to ICD-10 and DSM-IV diagnostic nomenclatures.1, 2 Families with associative medical conditions including fragile-X syndrome, chromosomal aberrations, neurocutaneous syndromes, and profound mental retardation were excluded from the sample. Three liability classes were generated for the statistical analyses. Liability class 1 (LC1) included only individuals fulfilling the strict diagnostic criteria for autism according to ICD-10 and DSM-IV. Liability class 2 (LC2) also included individuals with AS, and liability class 3 (LC3) had individuals with developmental dysphasia assigned as affected. In this study, only families having at least one individual fulfilling the strict criteria for LC1 were included in the statistical analyses in order to improve sample homogeneity. In other words, the broader classifications were present only in the proband's siblings. In the linkage analysis, the total number of families was 19 in LC1, 26 in LC2, and 33 in LC3. Additionally, 67 trios were used for the association analyses yielding a total sample of 100 families. Only the individuals fulfilling the strict diagnostic criteria for LC1 were included in the association analyses (Table 1). The study was approved by the relevant ethical committees and informed consent was obtained from the participating individuals or their parents.

Table 1 Description of study sample

Mutation analysis

We analyzed the entire protein coding sequence and the splice sites of NLGN1, NLGN3, NLGN4, and NLGN4Y genes by direct sequencing in a sample of 30 (nmale=26; nfemale=4) probands with ASDs (Figure 1). One proband from all the families sharing at least one haplotype identical-by-decent on Xq13 and/or 3q25–27 loci in the previous genome-wide screen was chosen for the analyses.24 Our genome-wide screen of separate sample of AS families did not provide strong evidence for linkage at any of the neuroligin loci.25 However, two individual families produced positive LOD scores for 3q25–27, and therefore a proband from both of these families were included in the mutation analysis. A total of 27 individuals had a diagnosis of autism (LC1) and three individuals were diagnosed as having AS (LC2) (Table 1). In addition, two healthy controls were included in the analyses. The identified variants were analyzed in a sample of 85–108 Finnish controls.

Figure 1
figure 1

Schematic overview of NLGN1, NLGN3, and NLGN4 genes. Genomic structures of the genes, analyzed SNPs and microsatellites as well as the identified sequence variants are presented. The sequence variant numbers correspond to the numbers in Table 2.

The coding sequence was amplified by PCR (primer sequences are available from the authors on request) and the specificity of the PCR products was assessed by 1.5% agarose gel electrophoresis. The PCR products were purified with exonuclease I (USB Corporation) and shrimp alkaline phosphatase (USB Corporation) treatment and the sequencing reactions were performed using the Big Dye Terminator v.3.1 kit (Applied Biosystems) according to the manufacturer's instructions. Electrophoresis was performed on an ABI3730 DNA sequencer (Applied Biosystems). The sequence analyses were performed with Sequencher 4.0.5 software (Gene Codes Corporation). For splice site predictions we used the GeneSplicer Web Interface (http://www.tigr.org/tdb/GeneSplicer/gene_spl.html), the Berkley Drosophila Genome Project Web page (http://www.fruitfly.org/seq_tools/splice.html), and the NetGene2 Server (http://www.cbs.dtu.dk/services/NetGene2/).

Microsatellite genotyping

A total of 16 microsatellite markers were genotyped in a sample of 100 families. Genotyping of flanking markers for the genes under study in the previous genome-wide set of markers was extended to the complete family material.24 In addition, intragenic markers were selected from the UCSC Human Genome Browser (http://genome.ucsc.edu/, July 2003 assembly).26 Since no intragenic markers were available for NLGN3 gene, we generated two novel markers entitled ms.NLGN3-3 and ms.NLGN3-4 using the Baylor College of Medicine (BCM) Sequence Launcher's repeat masker algorithm27 (http://searchlauncher.bcm.tmc.edu/). The PCR reactions were performed in 15 μl reaction volumes in 96-well plates, as described elsewhere.24 The forward primers were labeled at the 5′ end with 6-FAM, VIC, NED or PET fluorescent dye. The PCR products were pooled and electrophoresed on an ABI3730 DNA sequencer (Applied Biosystems), and the genotypes were assigned using the GENEMAPPER 3.0 software (Applied Biosystems). Genotype errors were checked using the PEDCHECK 1.1 computer program.28

SNP genotyping

We used the UCSC Human Genome Browser, Celera Discovery System (http://www.celeradiscoverysystem.com), and dbSNP (http://www.ncbi.nlm.nih.gov/SNP) for selecting the SNPs. For SNP genotyping we used an in-house developed allele-specific primer extension-based microarray method.29 For each SNP, we designed two allele-specific extension primers containing 5′ amine modification and 3′ nucleotide defining the allele. Microarrays were manufactured on aminosilane-coated microscopic slides by dispensing nanoliter volumes of 20 μ M allele-specific extension primers in 0.4 M sodium acetate buffer (pH 9) on the slide surface. The multiplex PCR reactions were designed using a web-based system accessible at http://apps.bioinfo.helsinki.fi/mpd. A T7-RNA polymerase promoter sequence (TAATACGACTCACTATAGGGAGA) was added to the 5′ end of either the forward or reverse primer. The PCR reactions included 15 ng of template DNA, 200 μ M of dNTPs, 0.8 U of AmpliTaq Gold DNA polymerase (Applied Biosystems), 1 × AmpliTaq buffer, and 3 pmol of each primer. A 2.0 μl aliquot of multiplex PCR product carrying a 5′ T7-RNA polymerase promoter sequence was used as the template for in vitro transcription by Ampliscribe T7 RNA polymerase (Epicentre Technologies).

RNA was treated for 15 min at 37°C with DNAseI enzyme (Epicentre Technologies) followed by 15 min at 65°C to yield clean single-stranded RNA templates. About 2 μl of the sample in 1.6 M NaCl was hybridized on subarrays containing allele-specific oligos. Hybridization was performed in a humid chamber at 42°C for 20 min. The slides were washed in a buffer (0.5X TE, 0.3 M NaCl, 0.1% Triton X-100), rinsed with cold dH2O, and dried by pressurized air. Primer extension reaction mix (1 × MMLV-RT buffer; 2 U MMLV-RT enzyme [Epicentre Technologies]; 0.5 mM dATP, dGTP, ddATP, and ddGTP; 1.0 mM Cy5-dCTP and Cy5-dUTP; in 0.5 mM trehalose/glycerol) was applied on each subarray and the extension reaction was performed by incubating the slides at 52°C for 20 min. Washing was performed in a similar manner as after hybridization and the sides were scanned with a ScanArray4000 laser scanner instrument (Perkin Elmer Life and Analytical Sciences). The signals were quantified by QuantArray 3.0 software (Packard Bioscience) and the genotypes were assigned by using SNPSnapper (v.3.88b) analysis program (http://www.bioinfo.helsinki.fi/SNPSnapper/). Mendelian inconsistencies were checked for using the PEDCHECK 1.1 computer program28 and Hardy–Weinberg calculations were performed to ensure that each marker was within the allelic population equilibrium in our sample set.

Statistical analyses

The two-point LOD scores were calculated by the MLINK program of the LINKAGE package using both dominant and recessive models allowing heterogeneity.30, 31 We used the program ANALYZE to conduct these analyses.32 The allele frequencies for each marker were derived by the DOWNFREQ 2.1 program from the genotypes of all individuals who were genotyped in the study. The extent of linkage disequilibrium between SNPs in NLGN1 locus was estimated by the HaploView-program.33 For family-based association analysis, we used the PSEUDOMARKER analysis program and considered the LD given linkage option with both dominant and recessive models.34 In addition, FBAT was used for pairwise association analyses and to perform haplotype association analyses in autosomal markers.35 The empirical variance option of FBAT was used, as appropriate in the presence of linkage and when data for multiple sibs in a family were available. An affected-only approach was taken in all the analyses; in other words the individuals were coded either as affected or as unknown.

Results

Mutation analysis

Two coding sequence variants, K494 K (1482G>A) and P818P (2454C>T), were identified in NLGN1, but both of these are silent mutations, which do not affect protein structure. In addition, three intronic variants and one 5′UTR variant were identified as shown in Table 2. None of the intronic variants were predicted to affect splicing and, thus, none of the variants identified in NLGN1 seems to be functional. We could not establish any sequence variants in NLGN4 or NLGN4Y and only one rare variant was found in NLGN3. This was a silent Y74Y (222C>T) mutation in the first protein coding exon present in one affected male. The 222C>T variant of NLGN3 was the only identified variant, which cosegregated completely with the phenotype in the family. Only two of the identified variants, 493-45A>G (rs3853390) and 1482G>A (rs7646919) on NLGN1, were present in several affected individuals, and these were the only variants available in the dbSNP. Both of these SNPs were also included in the genotyping stage. Taken together, no obvious functional mutations were identified in any of the genes analyzed.

Table 2 Sequence variants identified in the mutation analysis

Statistical analyses

A total of 24 markers were analyzed for the NLGN1 locus (3q26), of which three resulted in the Zmax>2.0, the best marker being D3S2421 (Zmax=2.58, LC1, dominant model). This marker is located 5.1 cM proximally to the best marker D3S3037 in the original genome scan and 1 Mb downstream of NLGN1.24 In addition, several other markers resulted in Zmax>1.0 suggesting that the linked chromosomal region on 3q25–27 extends to overlap with NLGN1 (Table 3). For the NLGN3 locus (Xq13), we genotyped a total of eight markers. The best linkage evidence was obtained for DXS7117 located some 1.4 kb downstream of NLGN3 (Zmax=2.39, LC2, dominant model). In addition, all the other markers at NLGN3 except for CHLC.ATA37A12.P32845 resulted in Zmax>1.0. By contrast, none of the four microsatellites analyzed at the NLGN4 locus (Xp22) resulted in Zmax>1.0.

Table 3 Pairwise linkage and association results

Modest evidence for association was observed at five out of the 35 markers tested. The best association for NLGN1 was seen at marker rs1488545 (FBAT, P=0.002; PSEUDOMARKER, P=0.041). Also a nearby marker rs1352416, which was in LD with rs1488545 (D′=1.0), showed some trend towards association (FBAT, P=0.004; PSEUDOMARKER, P=0.07). However, the microsatellite D3S1565 located only 500 bp from rs1488545 did not show any evidence for positive association (Table 3). Similarly, the analysis of two-marker haplotype constructed from rs1488545 and rs1352416 revealed less evidence for association compared with the evidence obtained for rs1488545 alone (rs1488545–rs1352416, P=0.01). Two of the X-chromosomal markers yielded nominal association in the dominant PSEUDOMARKER analysis. These were DXS7132 (P=0.014) located at the 5′ side of NLGN3 and DXS996 (P=0.031) located within NLGN4 (Table 3).

Four additional regions of elevated LD (D′>0.8) were observed in NLGN1. These included markers rs3853390–rs983303–rs1472647 (61 kb), hCV1176512–hCV1176480 (31 kb), rs1873039–rs1488547–hCV1176370 (44 kb), and hVC1196655–hCV3083907 (73 kb). The first block involved four common haplotypes with a frequency > 5%, whereas all the other blocks had three common haplotypes. We tested haplotype association within these blocks, but none of them yielded significant results (P>0.05). Data from 11 out of 18 SNPs analyzed on NLGN1 locus existed on the HapMap Webpage (http://www.hapmap.org). The LD pattern in the HapMap data was similar to the LD structure in the current Finnish sample (data not shown).

Discussion

Based on the results from our earlier genome-wide screen in the Finnish families with ASDs and the recent reports showing that mutations in neuroligin genes may lead to autism, we hypothesized that neuroligins might be particularly relevant candidates for autism in our sample.14, 15, 24 We have established a susceptibility locus for autism on 3q25–27, the best marker D3S3037 located only 3.4 Mb (4 cM) from the NLGN1. Also, the Xq13 locus containing the NLGN3 gene was among the most promising loci observed, the best marker DXS7132 residing 5.7 Mb (<4 cM) from the NLGN3.24 Such distances of a few Mb are often seen between the best linkage markers and the actual disease-associated genes, especially in complex diseases for which the parameters provided for statistical tests are not accurate. In the current study, we analyzed the involvement of NLGN1 and NLGN3 in the etiology of autism. Based on previously reported mutations in NLGN4, this gene and its Y-chromosomal homologue NLGN4Y were also included in the study.14, 15

We could not establish any functional mutations in any of the four neuroligin genes analyzed in 30 probands selected from families showing linkage evidence for Xq13 and/or 3q26 loci. This implies that the coding sequence variants do not explain the linkage observed in our data set. In the original report, the two probands with NLGN4 or NLGN3 mutations were identified among the sample of 158 patients.14 Recently, Vincent et al36 and Gauthier et al37 performed mutation screening of NLGN3/NLGN4 genes by using samples of 196 and 96 autistic probands, respectively. No causative mutations were identified in either of the samples. In addition, one recent study has reported missense mutations in NLGN4 but with no confirmed etiological significance.38 Therefore, the currently existing data indicate that the coding sequence mutations of neuroligins are extremely rare causes of autism (0.4%). However, it is important to note that by current study design we cannot exclude the existence of rare neuroligin mutations in some autism cases also in the Finnish sample.

We want to emphasize that excluding the coding sequence mutations is not sufficient when evaluating the role of candidate genes in diseases with a complex genetic background. Therefore, we employed a dense set of microsatellite and SNP markers within and flanking the NLGN1, NLGN3, and NLGN4 genes and used a family-based association strategy to further dissect the role of these genes in a sample of 100 Finnish families with autism. A total of five markers yielded evidence for association at the P<0.05 level. Although these nominal associations may be worth following up in different autism samples, it is probable that they only account for random statistical fluctuation. The best association evidence was obtained for rs1488545 located in the fourth intron of NLGN1. However, adding information by haplotype analysis or analysis of closely located microsatellite revealed decreased association evidence. Similarly, the association evidence obtained for two X-chromosomal markers was highly suggestive. It is not well established how P-values should be corrected for multiple testing when the tested markers are tightly correlated,39 but it is evident that none of the associations reported here would remain significant if such corrections were performed. It is equally clear that more stringent significance level criteria than the traditional P<0.05 are needed for convincing associations in the presence of multiple testing and low prior probability.40 Therefore, our findings do not provide strong evidence for the involvement of any of these genes in the etiology of autism in the Finnish material.

The previously identified neuroligin mutations link the etiology of autism with components participating in the synaptogenesis. At the postsynaptic side, neuroligins bind to PSD-95, which is involved for example in the localization of NMDA2 receptor and K+ channels to synapses.17 Both neuroligins and PSD-95 seem to have a role in adjusting the balance between excitatory and inhibitory synapses and it has been hypothesized that at least some forms of autism might be caused by an imbalance of neuronal excitation/inhibition.41, 42 Although the current data indicate that neuroligin mutations are implicated only in rare cases of autism, components involved in the synaptogenesis and synaptic structures remain excellent functional candidates for future molecular genetic studies of autism and related disorders. Furthermore, functional analyses of the neuroligin pathway may eventually lead to better understanding of the pathophysiology of ASDs.