Introduction

Autism is a severe neurodevelopmental disorder characterised by impairments in reciprocal communication and social interaction, accompanied by unusually restricted and stereotyped patterns of behaviours and interests, and an onset in the first 3 years of life.1 The population prevalence of autism is approximately 10–30/10 000,2 with a male to female ratio of 4:1.3 When other pervasive developmental disorders (PDD) are also considered, the prevalence may be as high as 20–60 in 10 000 children.4,5 In several epidemiological studies of autism, the most consistent anatomical result is macrocephaly.6,7 Neuroanatomical findings point to abnormalities in the cerebral cortex, cerebellum, and brainstem.8,9,10

Twin and family studies have indicated a complex genetic predisposition to autism.3,11,12,13 and statistical models suggest that between two and 10 loci are implicated.14 Several genome scans for autism susceptibility loci have been completed, providing evidence that the long arm of human chromosome 7 is likely to contain an autism susceptibility locus (AUTS1) (reviewed in Folstein and Rosen-Sheidley15).

In order to identify autism susceptibility genes on 7q, we have systematically screened functional candidate genes, mapping to the region of linkage, for the presence of etiological mutations/variants. Here, we report the analysis of six candidate genes mapping to the region of linkage and with neuronal function: CUTL1, SRPK2, SYPL, LAMB1, NRCAM, and PTPRZ1.

CUTL1 (Cut-like 1) is the human homologue of Drosophila melanogaster gene Cut, which has a role in determining and maintaining cell-type specificity.16 The full-length protein contains a homeodomain and acts as a repressor of transcription.17 One of the alternative forms, (Cut alternative spliced product (CASP)) lacks the DNA-binding domains and is a transmembrane protein of the Golgi system.18 The gene SRPK2 encodes for serine arginine protein kinase isoform 2, a member of specific kinases for SR-rich splicing factors19 with a brain-restricted expression pattern.20 SYPL encodes for synaptophysin-like protein, a major integral calcium-binding molecule required for vesicle fusion in synapses.21 LAMB1 encodes for the β1 chain of laminin, an extracellular matrix (ECM) glycoprotein complex.22 Laminins promote neuronal migration and neurite outgrowth in the developing nervous system.23 NRCAM encodes for Bravo/NrCAM (NgCAM-related cell adhesion molecule) protein, a member of the immunoglobulin superfamily of cell adhesion proteins.24 NrCAM proteins promote directional signaling during axonal cone growth.25 PTPRZ1 encodes for protein tyrosine phosphatase receptor type Z, a transmembrane protein expressed primarily in the CNS, during development and in adult brain.26

Materials and methods

IMGSAC multiplex and singleton families

The identification of families, assessment methods, and inclusion criteria used by the IMGSAC have been described previously.27 In families passing an initial screen, parents were administered the ADI-R28 and the Vineland Adaptive Behaviour Scales.29 Potential cases were assessed using the ADOS.30 Physical examination was undertaken to exclude recognizable medical causes of autism, particularly tuberous sclerosis. Karyotyping was performed when possible on all affected individuals and molecular genetic testing for Fragile X performed on one case per family.27 Families have been collected in six successive waves for a total of 207 families comprising 219 nonindependent affected sibling pairs (ASP) (145 male–male ASP, 59 male–female ASP and 15 female–female ASP).

The identification and assessment of IMGSAC singletons was similar to the multiplex families; a total of 98 singleton families from the UK, Netherlands and Denmark were included in the study. A total of 112 German singleton families subdivided into groups A (63 male and 21 female cases) and B (24 male and four female cases) with individuals from group B showing no delay in the development of language31 and 42 Italian singleton families were also included. Written informed consent was given by all parents/guardians and, where possible, by affected individuals. The study has been approved by the relevant ethical committees.

Gene characterization

The genomic structure for each gene was obtained by BLAST comparison (http://www.ncbi.nlm.nih.gov/BLAST) of the coding mRNAs with the genomic sequence (TCAG website; http://www.chr7.org). Exon–intron boundaries were identified and primers designed to cover exons and regulatory splice site regions using the program Primer3 (http://www.genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). Promoter regions were determined using Promoterscan (http://zeon.well.ox.ac.uk). Sequences and PCR conditions of all primer pairs are available on request. CUTL1 covers a genomic region of 470 kb and comprises 33 exons. The full-length CUTL1 mRNA (Accession no. NM_181552) contains exon 1b – 24. CASP mRNA (Accession no. NM_001913) contains exons 1a – 14 and 25–33.32 SRPK2 (mRNA Accession no. NM_182691) extends over a genomic region of 153 kb, comprising 15 exons; SYPL (mRNA Accession no. NM_182715) covers a genomic region of 23 kb and is composed of six exons; LAMB1 (mRNA Accession no. NM_002291) extends over a region of 95 kb, comprising 34 exons. NRCAM covers a genomic region of 380 kb and contains 34 exons (Accession no. NM_005010). Different transcripts of NRCAM are produced by alternative splicing of exons 10, 19, and 27–29. PTPRZ1 covers a genomic region of 189 kb and contains 30 exons (Accession no. NM_002851).

Mutation screening by denaturing high-performance liquid chromatography (DHPLC)

Genomic DNA was extracted from blood as described previously.33 Genomic DNA extracted from buccal swabs was preamplified using GenomiPhi according to manufacturer's instructions (Amersham Pharmacia Biotech). PCR amplifications and DHPLC analysis were performed as described previously.34 Samples showing a variant DHPLC pattern were reamplified and sequenced on both strands using BigDye v3.0 (Applied Biosystems) according to the manufacturer's instruction to determine the nature of the heterozygous changes. Sequences were loaded on ABI377 sequencing machines (Applied Biosystems) and analyzed using Sequence Navigator v3.1.

Prediction analysis of amino-acid substitutions

PolyPhen (http://tux.embl-heidelberg.de/ramensky/polyphen.cgi) was used to predict the possible impact of amino-acid substitutions on the protein. The program is based on sequence comparison with homologous proteins; profile scores, position-specific independent counts (PSIC) are generated for the allelic variants and represent the logarithmic ratio of the likelihood of a given amino-acid occurring at a particular site relative to the likelihood of this amino-acid occurring at any site (background frequency). PSIC score differences above 2 indicate a damaging effect; scores between 1.5 and 2 suggest that the variant is possibly damaging, whereas scores below 0.5 indicate that the variant is benign.35

Single-nucleotide polymorphism (SNP) genotyping

The insertion in exon 12 of PTPRZ1 was fluorescently genotyped on ABI377 sequencing machines, as described previously.27 The SNPs in intron 3 of CUTL1, exon 20 of LAMB1, and exon 5 of PTPRZ1 were genotyped by restriction digestion using the enzymes TaqI, AluI, and AciI (New Englands Biolabs), respectively, according to standard protocols. In the German and Italian singleton samples, the missense change in exon 30 of LAMB1 was genotyped by restriction digestion using the enzyme AflIII. In the IMGSAC sample, the missense change in exon 30 of LAMB1 was genotyped using the MassARRAYâ„¢ primer extension system (see below). The SNP in exon 1 of NRCAM could not be distinguished by a commercial restriction enzyme; therefore, mismatch primers inserting a BsaJI site were created using Insizer (http://zeon.well.ox.ac.uk).

MassARRAYâ„¢ primer extension

In total 23 SNPs selected from the SNP consortium (http://snp.cshl.org) and 17 SNPs identified in our mutation screening were genotyped using the MassARRAY™ system. Genotyping assays were designed using Sequenom's SpectroDESIGNER™ software (Version 1.3.4) and genotypes obtained using the MassARRAY™ system. Multiplex PCR amplifications were performed in 384-well plates in a final volume of 10 μl using 24 ng of genomic DNA as described previously.36 Primers and conditions are available on request. Genotyping was performed using the matrix-assisted laser desorption time of flight (MALDI-TOF) technology with the Bruker Biflex III Mass Spectrometer system, as described previously.36 Genotypes were assigned using the SpectroTYPER™ software.

Error checking

The LIMS Integrated Genotyping System database was used to store all genotypic and phenotypic data and to produce files for statistical analysis (http://bioinformatics.well.ox.ac.uk/project-lims.shtml). Genotypes were checked for Mendelian consistency using PedCheck.37 Haplotypes were constructed using Genehunter v2.0 and, in cases where apparent excess recombination was observed, genotypes were rechecked and corrected where necessary.

Prior to statistical analysis, SIBMED was run on data from the multiplex families to identify any remaining possible genotyping errors38 using a false-positive rate of 0.001 and a prior genotyping error rate of 0.01. All the SNPs were tested for Hardy–Weinberg equilibrium.

Association analysis

Association was studied using the transmission disequilibrium test (TDT)39 with the sib_tdt option from ASPEX v2.3.40 This program calculates probabilities for χ2 statistics by permuting parental alleles while fixing the IBD status of siblings within a family, thereby allowing the use of multiple siblings within a nuclear family. Further analysis at the AUTS1 has suggested that linkage derives mainly from the male ASP, and parent-of-origin linkage modelling indicates two distinct regions of paternal and maternal linkage on chromosome 7 (IMGSAC, unpublished data); therefore, transmissions to male–male ASP and parental transmissions were also examined.

Linkage disequilibrium

The extent of linkage disequilibrium (LD) between intragenic SNPs was studied using the Haploxt program41 and characterised with Lewontin's standardised measure of disequilibrium D′.42

Haplotype analysis

Haplotypes were reconstructed for all 219 ASP using MERLIN.43 Haplotypes were recoded as single markers, with each haplotype combination considered as a different ‘allele’. Transmission was studied using the sib_tdt option from ASPEX v2.3. Haplotype transmission was analysed for SNPs showing nominally significant association at the single-locus level and flanking markers.

Results

Mutation screening

A total of 48 (46 males, two females) unrelated individuals with autism from the multiplex IMGSAC families were screened for sequence variants by DHPLC. Individuals were selected from families showing increased identical by descent (IBD) sharing in the region surrounding the candidate genes, where ASP were IBD1 or IBD2 across a ∼15 Mb region containing 15 microsatellite markers. In total, 38 individuals had a clinical diagnosis of autism, met ADI-R and ADOS criteria for autism, and had a history of language delay and a performance IQ⩾35; the other individuals met the criteria for PPDD or Asperger syndrome. We identified a total of 112 sequence variants: 26 in CUTL1, six in SRPK2, two in SYPL, 32 in LAMB1, 25 in NRCAM, and 21 in PTPRZ1. Comparison with dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) identified 39 changes as known SNPs. Nine changes led to amino-acid substitutions, insertions or deletions. The frequency and positions of the changes identified through our screening are shown in Table 1 and Supplementary Tables 1 and 2 (see Supplementary Tables online). The presence of all missense variants and insertion–deletions was tested in a control group of 192 random Caucasian individuals from the European Collection of Cell Cultures (ECACC). Differences in frequencies of heterozygous individuals in cases (48 individuals) and controls (192 individuals) were calculated using Fisher's exact test. The deletion of amino-acid K1256 in CUTL1 was not identified in 192 controls. It is located in the homeodomain and maps to a conserved position in Cut proteins. Analysis of the crystal structure of homologous proteins in complexes with DNA suggests that K1256 may interact with the deoxyribose-phosphate backbone (R Esnouf, personal communication). This change is transmitted from the father to all three sons with autism, and not to the unaffected brother, but also to the unaffected sister. Phenotypic investigation of all family members showed that the parents, the non-autistic son, and daughter present some difficulties in socio-emotional interactions and/or circumscribed interests. However, since both the father and son appear to have the broader autism phenotype,44 this variant does not always segregate with the phenotype in the family (see Supplementary Figure 1 online). The deletion was investigated in 342 individuals with autism from 169 multiplex families, and was not identified in other subjects.

Table 1 Nonconservative coding changes identified in the mutation screening of CUTL1, LAMB1, NRCAM, and PTPRZ1

LAMB1 and PTPRZ1 coding changes

Four missense changes were identified in LAMB1 in exons 19, 20, 22, and 30. The G3400A (R1022Q) in exon 22 (rs20556) and G2913A (G860S) in exon 20 were recently described;45 C2718G (R795G) in exon 19, and T4975C (I1547T) in exon 30 are new variants. Comparison with homologous proteins in different species (Figure 1) shows that I1547 is conserved, whereas for R795 a positively charged residue is always present. In silico methods have been developed to predict the potential of amino-acid substitutions to impact protein structure and activity, based on sequence conservation, physical, and chemical properties of the exchanged residues, and/or protein structural domain information. We used PolyPhen (Polymorphism Phenotyping), one of these algorithms, to study if the new variants might have a functional role. PolyPhen is based on all the previous characteristics, but also values the location of the substitution within identified functional domains and known structural features available in the annotated database (SwissProt). Testing PolyPhen using known variants confirmed its ability of discriminating between benign and deleterious variants and its high concordance with other algorithms.46

Figure 1
figure 1

Multiple sequence alignment of laminin β1 from different species (Accession No: NP002282 (H. sapiens), AAA39407 (Mus musculus), NP775382 (D. rerio), NP500734 (C. elegans)) across the regions with new missense changes. Completely conserved residues are shaded in black, partially conserved residues in grey. Arrows indicate the substituted amino-acids identified through the mutation screening.

The I1547T change was predicted to be damaging (PSIC score 2.166). The frequency of this variant was higher in the autistic sample compared to the control group (10.4% versus 3.66%, respectively, P=0.0466; Table 1) and it cosegregated with the autism phenotype in the families included in the mutation screening.

The phenotypic characteristics of autistic individuals carrying the missense change in exon 30 were compared with the other individuals included in LAMB1 screening who did not carry the change, to identify a possible distinguishing phenotype, but no significant differences were identified.

The T155G change in exon 1 of PTPRZ1 was found with increased frequency in the autistic cases compared to controls (29.2 versus 13.8%, respectively, P=0.0201), but when the analysis was extended to the relatives, the change was found to segregate with the autism phenotype only in 46% of the families. In addition, the change (I3S) resides in the N-terminal signal domain of the protein and no negative effect on the protein is predicted for this variant.

Association studies

A total of 45 SNPs were genotyped across the candidate genes and tested for association (Table 2). Association analysis was performed using the TDT implemented in ASPEX. Analysis was performed on the whole sample of 219 ASP and in the subset of 145 male–male pairs, considering the sex of the individuals as a possible index of heterogeneity (IMGSAC, unpublished data). Combined P-values for all SNPs are reported in Table 2. Transmissions for SNPs showing a nominally significant P-value are reported in Table 3. The I1547T variant in LAMB1 (LAMB1 × 30, C allele) showed a preferential paternal transmission (15 transmissions versus four not transmitted, P=0.0116; combined parental, P=0.0112) in all families. This association is more significant in the male ASP (13 paternal transmissions versus 0 not transmitted, P=0.0003, combined parental P=0.0016). In NRCAM, three SNPs showed transmission disequilibrium (Table 3). Association analysis in the 219 ASP showed a combined preferential transmission of allele A for SNP rs1269622 (combined P=0.0135); preferential transmission of allele G for SNP rs722519 (paternal P=0.0047; combined P=0.0078) – maternal transmission showed the same trend, although not reaching statistical significance – and preferential paternal transmission of allele G (P=0.0045; combined P=0.0668) for SNP rs3763463. Nominally significant association was detected for the SNP in intron 2 of SYPL in all families (combined parental, P=0.0212, maternal P=0.0278); however, given the number of tests performed, this result may have arisen by chance.

Table 2 SNP ID, position, sequences and TDT P-values of all the markers genotyped across the candidate genes
Table 3 TDT results for the SNPs showing association in SYPL, LAMB1, and NRCAM

LD patterns

The profile of LD across each gene was studied using the Haploxt program and the Lewontin D′ standardised measure of LD. Extensive LD is present between markers rs436287 and rs1297632 in CUTL1 and across SRPK2. In SYPL, high LD was detected between the intron 2 SNP and marker rs176501 (D′=0.839). No significant LD was found across PTPRZ1 (data not shown). The profile of LD across LAMB1 and NRCAM is shown in Figure 2.

Figure 2
figure 2

Patterns of LD for LAMB1 (a) and NRCAM (b) genes. Pairwise estimation of D′ is shown. Arrows indicate SNPs showing association.

Haplotype transmission study

Haplotype analysis was performed to further characterise the SNPs showing evidence for association. The determination of haplotypes of SNPs in LD offers more power to detect association than testing of SNPs individually.47

Haplotype A–A for markers SYPLint2-rs2891878 and C–A for markers LAMB1 × 30 and LAMB1int24 were overtransmitted, but statistical significance was not increased compared to single markers (data not shown).

In NRCAM, analysis was first performed for the two markers showing evidence of association, SNPs rs722519, and rs3763463. rs3763463 and rs722519 are located in the promoter and in the second intron of NRCAM, respectively. Subsequent analysis was extended to two, three, and four SNP haplotypes including markers rs1990162 and rs917251 between the two SNPs (SNP rs216055 was not included, since it was not in LD with marker rs722519). Significant preferential transmission was detected for the G–G haplotype of markers rs722519 and rs3763463 (P=0.0005; Table 4) and for haplotype G–C–G of rs722519–rs917251–rs3763463 (P=0.0001). Analysis of haplotypes containing the A allele of SNP rs1269622 did not increase statistical significance compared to the single-marker test (data not shown).

Table 4 Haplotype transmission disequilibrium results for NRCAM

Replication in singleton families

The five SNPs showing evidence for association were further studied in an independent sample of 98 IMGSAC singleton families. The intron 2 SNP in SYPL and SNPs rs1269622, rs722519, and rs3763463 in NRCAM did not provide evidence for association (P>0.05; data not shown). A nonsignificant trend for a preferential transmission of the C allele was observed for the exon 30 variant in LAMB1 (six transmissions versus one nontransmission, P=0.0588). This missense change was further studied in 154 individuals with autism (42 singletons from Italy, 112 singletons from Germany) and was identified at a frequency similar to the control group (data not shown).

Discussion

A defining feature of complex phenotypes is that no single locus contains alleles that are necessary or sufficient for disease susceptibility, but little is known about the nature of genetic variation underlying human complex diseases. One problem is identifying whether genetic variance is due to a small number of loci where susceptibility alleles are common, or due to a much larger number of loci where susceptibility alleles are quite rare.48 If allelic variation at a complex disease locus is extensive, with multiple susceptibility alleles of independent origin present, the detection of association between marker genotypes and disease phenotype might be negatively affected.49 Several independent linkage studies point to the presence of an autism locus on chromosome 7q (reviewed in Folstein and Rosen-Sheidley15). Nonetheless, the region of interest is very broad and contains at least 190 known genes.50 Further characterization of the AUTS1 in 219 ASP suggests that linkage may derive mainly from the male ASP, and parent-of-origin linkage modelling indicates two distinct regions of paternal and maternal linkage (IMGSAC, unpublished data). A combination of approaches has been undertaken to identify the chromosome 7q autism susceptibility gene(s): a candidate gene-screening approach, focused on the direct detection of susceptibility variants even with low frequency, and family-based association studies on the genes tested for mutation, in order to detect association even if the susceptibility alleles were not directly tested.

A novel missense variant (I1547 T) in exon 30 of LAMB1 showed a preferential paternal transmission of the rare new allele. A possible involvement of this variant in autism susceptibility is intriguing, as amino-acid I1547 is conserved from Caenorhabditis elegans to Homo sapiens, and this change is predicted to have a damaging effect on protein structure. In addition, this effect is more marked when considering only the male ASP, in concordance with the linkage findings. However, the frequency of this SNP in the multiplex families does not explain all the linkage at the AUTS1 locus and suggests that it might play a role only in a subset of individuals with autism. Interestingly, in an independent sample of 98 singleton families the same trend for preferential transmission of the change was observed, although no difference in frequency was detected in a second independent sample of 154 singleton cases compared to controls.

Association was identified for SNPs in the promoter (rs3763463) and noncoding regions of NRCAM (rs722519; rs1269622) and association was more significant when considering the haplotype transmissions. Analyses of SNPs using MatInspector (http://genomatix.de) revealed several potential binding sites for transcription factor, depending on the allele present (data not shown). These SNPs might affect regulatory regions, suggesting a potential alteration in NRCAM expression. Loss or abnormal expression of neuronal cell adhesion molecules leads to several neuronal faults.51 Defects in neuronal organisation have been reported in several post-mortem studies of brains of individuals with autism9 and there is preliminary evidence of abnormal axonal connectivity (P Luthert, A Dean and AJ Bailey, unpublished data).

The possible role of LAMB1 and NRCAM, both with important roles in brain function, in autism susceptibility is intriguing but not clear. In LAMB1 we detected association for a paternally transmitted rare allele; in NRCAM association derives from common alleles and haplotypes. Association between a complex disorder and common alleles has been shown for type II diabetes and the calpain-10 gene.52 Recently, Hutchenson et al45 reported the analysis of several genes mapping to chromosome 7 in autism, including LAMB1 and NRCAM, and presented evidence for positive association in LAMB1 in their sample. Although they did not report the presence of the missense change in exon 30, this might have been hampered by the relatively small number of individuals with autism screened for mutations. Nonetheless, the presence of association in the same gene reported by an independent group corroborates our results in LAMB1 and suggests that further analysis of this gene in autism is warranted. The lack of association for NRCAM in the work of Hutchenson et al, compared to ours, might be due to differences in the sample collection such as the inclusion–exclusion criteria, or the use of different analytical approaches and marker tested and the different size of family collections. The TDT approach we used is robust to confounder effects, which may hamper the success of classical case–control studies, and is an effective method to refine the localization of a susceptibility locus in a region of linkage, in the presence of LD.53 When testing multiple marker loci, a correction such as the Bonferroni test should be considered to compute P-values. However, applying any correction for tightly linked markers would be too conservative;54 thus, in the present study, no correction was applied.

In conclusion, the most interesting results we found are for LAMB1 and NRCAM, whereas no evidence for a role of CUTL1, SRPK2, SYPL and PTPRZ1 in autism susceptibility was detected. Replication of a finding in independent samples of affected individuals is always warranted to provide evidence for a true association. The recent report of association between autism and variants in LAMB1 offers the first support to our findings and for a role of LAMB1 within autism. However, since it is recognised that contributions of multiple genes are probably required for autism, further studies are necessary to better clarify the role of LAMB1 and NRCAM in neuronal development in relation to autism susceptibility.