An entropy-based genome-wide transmission/disequilibrium test

Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao

doi:10.1007/s00439-007-0322-6

An entropy-based genome-wide transmission/disequilibrium test

Original Investigation
Published: 13 February 2007

Volume 121, pages 357–367, (2007)
Cite this article

Human Genetics Aims and scope Submit manuscript

Jinying Zhao¹^nAff2,
Eric Boerwinkle¹ &
Momiao Xiong¹

137 Accesses
13 Citations
Explore all metrics

Abstract

Availability of a large collection of single nucleotide polymorphisms (SNPs) and efficient genotyping methods enable the extension of linkage and association studies for complex diseases from small genomic regions to the whole genome. Establishing global significance for linkage or association requires small P-values of the test. The original TDT statistic compares the difference in linear functions of the number of transmitted and nontransmitted alleles or haplotypes. In this report, we introduce a novel TDT statistic, which uses Shannon entropy as a nonlinear transformation of the frequencies of the transmitted or nontransmitted alleles (or haplotypes), to amplify the difference in the number of transmitted and nontransmitted alleles or haplotypes in order to increase statistical power with large number of marker loci. The null distribution of the entropy-based TDT statistic and the type I error rates in both homogeneous and admixture populations are validated using a series of simulation studies. By analytical methods, we show that the power of the entropy-based TDT statistic is higher than the original TDT, and this difference increases with the number of marker loci. Finally, the new entropy-based TDT statistic is applied to two real data sets to test the association of the RET gene with Hirschsprung disease and the Fcγ receptor genes with systemic lupus erythematosus. Results show that the entropy-based TDT statistic can reach p-values that are small enough to establish genome-wide linkage or association analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Burden of Mendelian disorders in a large Middle Eastern biobank

Article Open access 08 April 2024

Waleed Aamer, Aljazi Al-Maraghi, … Khalid A. Fakhro

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Article 12 April 2024

Ashraful Islam & Joe Suzuki

References

Borrego S, Ruiz A, Saez ME, Gimm O, Gao X, Lopez-Alonso M, Hernandez A, Wright FA, Antinolo G, Eng C (2000) RET genotypes comprising specific haplotypes of polymorphic variants predispose to isolated Hirschsprung disease. J Med Genet 37:572–578
Article PubMed CAS Google Scholar
Bourgain C, Genin E, Margaritte-Jeannin P, Clerget-Darpoux F (2001) Maximum identity length contrast: a powerful method for susceptibility gene detection in isolated populations. Genet Epidemiol 21(Suppl 1):S560–S564
PubMed Google Scholar
Clayton D, Jones H (1999) Transmission/disequilibrium tests for extended marker haplotypes. Am J Hum Genet 65:1161–1169
Article PubMed CAS Google Scholar
Edberg JC, Langefeld CD, Wu J, Moser KL, Kaufman KM, Kelly J, Bansal V, Brown WM, Salmon JE, Rich SS, Harley JB, Kimberly RP (2002) Genetic linkage and association of Fcgamma receptor IIIA (CD16A) on chromosome 1q23 with human systemic lupus erythematosus. Arthritis Rheum 46:2132–2140
Article PubMed CAS Google Scholar
Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–464
PubMed CAS Google Scholar
Freimer N, Sabatti C (2004) The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nat Genet 36:1045–1051
Article PubMed CAS Google Scholar
Graybill FA (1976) Theory and application of the linear model. Duxbury Press, North Scituate
Google Scholar
Hampe J, Schreiber S, Krawczak M (2003) Entropy-based SNP selection for genetic association studies. Hum Genet 114:36–43
Article PubMed CAS Google Scholar
Lehmann EL (1983) Theory of point estimation. Wiley, New York
Google Scholar
Nothnagel M (2002) Simulation of LD block-structured SNP haplotype data and its use for the analysis of case-control data by supervised learning methods. Am J Hum Genet 71(Suppl 4): A2363
Google Scholar
Rabinowitz D, Laird N (2000) A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 50:211–223
Article PubMed CAS Google Scholar
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Article PubMed CAS Google Scholar
Schaid DJ (1996) General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 13:423–449
Article PubMed CAS Google Scholar
Sham PC (1997) Transmission/disequilibrium tests for multiallelic loci. Am J Hum Genet 61:774–778
PubMed CAS Google Scholar
Sham PC, Curtis D (1995a) An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 59:323–336
PubMed CAS Google Scholar
Sham PC, Curtis D (1995b) An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 59(Pt 3):323–336
PubMed CAS Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Systems Tech J 27:379–423
Google Scholar
Spielman RS, Ewens WJ (1996) The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 59:983–989
PubMed CAS Google Scholar
Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516
PubMed CAS Google Scholar
Wilson SR (1997) On extending the transmission/disequilibrium test (TDT). Ann Hum Genet 61(Pt 2):151–161
Article PubMed CAS Google Scholar
Zhang S, Sha Q, Chen HS, Dong J, Jiang R (2003) Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet 73:566–579
Article PubMed CAS Google Scholar
Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd KK (2000) Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet 67:936–946
Article PubMed CAS Google Scholar
Zhao J, Boerwinkle E, Xiong M (2005) An entropy-based statistic for genomewide association studies. Am J Hum Genet 77:27–40
Article PubMed CAS Google Scholar

Download references

Acknowledgments

M. M. Xiong is supported by NIH-NIAMS grant IP50AR44888, HL74735, and NIH grant ES09912. J. Y. Zhao is supported by NIH grant ES09912. E. Boerwinkle is supported by grant from the National Heart, Lung and Blood Institute and the National Institute of General Medical Science.

Author information

Jinying Zhao
Present address: Division of Cardiology, Emory University School of Medicine, Atlanta, GA, 30322, USA

Authors and Affiliations

Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
Jinying Zhao, Eric Boerwinkle & Momiao Xiong

Authors

Jinying Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Eric Boerwinkle
View author publications
You can also search for this author in PubMed Google Scholar
Momiao Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Momiao Xiong.

Appendices

Appendix 1

Let ${X = - \hat{p}\log \hat{p}}$ and ${Y = - \hat{q}\log \hat{q}}.$ Both X and Y are nonlinear functions of the allele frequencies. The distribution of (X − Y) is asymptotically normal with mean zero and variance Var(X − Y), where

$$ \begin{aligned} \hbox{Var}(X - Y) & = [{X}\ifmmode{'}\else$'$\fi(p)]^{2} \frac{{pq}} {n} + [{Y}\ifmmode{'}\else$'$\fi(q)]^{2} \frac{{pq}} {n} + 2{X}\ifmmode{'}\else$'$\fi(p){Y}\ifmmode{'}\else$'$\fi(q)\frac{{pq}} {n} \\ & = \frac{{pq}} {n}[{X}\ifmmode{'}\else$'$\fi(p) + {Y}\ifmmode{'}\else$'$\fi(q)]^{2} \\ & = \frac{{pq}} {n}(2 + \log p + \log q)^{2}. \\ \end{aligned} $$

Under the null hypothesis of no linkage or no association, the frequency of the transmitted allele M ₁ is equal to the frequency of the transmitted allele M ₂, thus, we have X(p) = Y(q). Therefore, the distribution of (X−Y) is normal distribution with mean zero and variance ${\frac{{pq}}{n}(2 + \log p + \log q)^{2}}.$ Under the null hypothesis, ${\hbox{TDT}_{e} = \frac{{(X - Y)^{2}}}{{{\rm Var}(X - Y)}} = \frac{{n(\hat{p}\log \hat{p} - \hat{q}\log \hat{q})^{2}}}{{\hat{p}\hat{q}(2 + \log \hat{p} + \log \hat{q})^{2}}}}$ is asymptotically distributed as a central χ² ₍₁₎ distribution.

Appendix 2

First, we calculate ${\hbox{var} (\hat{p}_{{i \cdot}})}.$ By definition, we have

$$ \begin{aligned} \operatorname{var} (\hat p_{i \cdot }) &= \operatorname{var} \left(\frac{1} {n}\sum\limits_{j = 1}^k {n_{ij} } \right) \\ &= \frac{1} {{n^2 }}\left\{ {n\sum\limits_{j = 1}^k {p_{ij} (1 - p_{ij}) + (- n)\sum\limits_{\begin{array}{*{20}c} {j = 1} \\ {j \ne l} \\ \end{array} }^k {\sum\limits_{l = 1}^k {p_{ij} p_{il} } } } } \right\} & \\ &= \frac{1} {n}p_{i \cdot } (1 - p_{i \cdot }) & \\ \end{aligned} $$

where p _ii = 0.

Similarly, we have

$$\hbox{var} (\hat{q}_{{i.}}) = \frac{1}{n}q_{{i.}} (1 - q_{{i.}}).$$

Next we calculate covariance between ${\hat{p}_{{i \cdot}}}$ and ${\hat{p}_{{j \cdot}}(i \ne j}).$ Again, by definition, we obtain

$$ \begin{aligned} \operatorname{cov} (\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{i \cdot }},\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{j \cdot }}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{il}} } },\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{jm}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{l = 1}^k {{\sum\limits_{m = 1}^k {p_{{il}} p_{{jm}} } }} } \\ & = - \frac{1} {n}p_{{i \cdot }} p_{{j \cdot }} \\ \end{aligned} $$

Similarly, we have

$$\hbox{cov} (\hat{q}_{{i.}}, \hat{q}_{{j.}}) = - \frac{1}{n}q_{{i.}} q_{{j.}}$$

Now we calculate ${\hbox{cov} (\hat{p}_{{i \cdot}}, \hat{q}_{{j.}}).}$ First, we consider i ≠ j. In these cases, we have

$$ \begin{aligned} \operatorname{cov} (\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{i \cdot }} ,\ifmmode\expandafter\hat\else\expandafter\^\fi{q}_{{j.}}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{il}} } },\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{mj}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{\begin{array}{*{20}c} {{l = 1}} \\ {{l \ne j}} \\ \end{array} }^k {{\sum\limits_{\begin{array}{*{20}c} {{m = 1}} \\ {{m \ne i}} \\ \end{array} }^k {p_{{il}} p_{{mj}} } }} } + \frac{1} {n}p_{{ij}} (1 - p_{{ij}}) \\ & = \frac{1} {n}(p_{{ij}} - p_{{i \cdot }} q_{{j.}}) \\ \end{aligned} $$

Similarly, we have ${\hbox{cov} (\hat{q}_{{i.}}, \hat{p}_{{j \cdot}}) = \frac{1}{n}(p_{{ji}} - q_{{i.}} p_{{j \cdot}})}$ when i ≠ j.

Then, we consider i = j. For i = j, we obtain

$$ \begin{aligned} \hbox{cov} (\hat{p}_{{i \cdot }},\hat{q}_{{j.}}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{im}} } },\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{li}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{m = 1}^k {{\sum\limits_{l = 1}^k {p_{{im}} p_{{li}} } }} } + \frac{1} {n}p_{{ii}} \\ & = - \frac{1} {n}p_{{i \cdot }} q_{{j.}} \\ \end{aligned} $$

Thus, we have proven Eq. 6.

Let ${h(p) = [h(p_{{1 \cdot}}), \ldots, h(p_{{k \cdot}})]^{T}}$ and ${h(q) = [h(q_{{1 \cdot}}), \ldots, h(q_{{k \cdot}})]^{\rm {T}}},$ where ${h(p_{{i \cdot}}) = - p_{{i \cdot}} \log p_{{i \cdot}}}$ and ${h(q_{{i.}}) = - q_{{i.}} \log q_{{i.}}}.$ Then ${h(\hat{p}) - h(p)}$ and ${h(\hat{q}) - h(q)}$ are asymptotically distributed as normal distribution with mean zero and variance ${\frac{1}{n}B\Sigma_{p} B^{\rm {T}}}$ and ${\frac{1}{n}C\Sigma_{q} C^{\rm {T}}},$ respectively, where ${B = (b_{{ij}})_{{k \times k}}}$ and ${C = (c_{{ij}})_{{k \times k}}, b_{{ii}} = \frac{{\partial h(p_{{i \cdot}})}}{{\partial p_{{i \cdot}}}} = - 1 - \log p_{{i \cdot}}, b_{{ij}} = \frac{{\partial h(p_{{i \cdot}})}}{{\partial p_{{j \cdot}}}} = 0\begin{array}{*{20}c} & {{(j \ne i)}}, \\ \end{array} c_{{ii}} = \frac{{\partial h(q_{{i.}})}}{{\partial q_{{i.}}}} = - 1 - \log q_{{i.}}}$ and ${c_{{ij}} = \frac{{\partial h(q_{{i.}})}}{{\partial q_{{j.}}}} = 0\begin{array}{*{20}c} & {{(j \ne i)}} \\ \end{array}}$ Under the null hypothesis of no linkage or no association, we have h(p) = h(q), thus ${h(\hat{p}) - h(\hat{q}) = h(\hat{p}) - h(p) - [h(\hat{q}) - h(q)]}$ is asymptotically distributed as normal distribution:

$$h(\hat{p}) - h(\hat{q}) = h(\hat{p}) - h(p) - [h(\hat{q}) - h(q)]\sim N\left(0,\frac{1}{n}\Lambda\right),$$

where

$$\Lambda = B\Sigma_{p} B^{\rm {T}} + C\Sigma_{q} C^{\rm {T}} - B\Sigma_{{pq}} C^{\rm {T}} - C\Sigma^{\rm {T}}_{{pq}} B^{\rm {T}}.$$

(11)

Applying Theorem 4.4.3 (Graybill 1976), we obtain ${n[h(\hat{p}) - h(\hat{q})]^{T} \Lambda^{-}_{e} [h(\hat{p}) - h(\hat{q})]}$ is asymptotically distributed as a central χ² _(r) distribution under the null hypothesis of no linkage or no association, where r = rank(Λ_e), and Λ_e is the estimator of matrix Λ by substituting the estimators of p _i· and p _·i into Eq. 11.

Appendix 3

Following the approach of Sham and Curtis, we can obtain the joint probability that a heterozygous parent with genotype M _i M _j transmits the allele M _i to an affected child. Let TM denote the transmitted marker allele, NM denote the nontransmitted marker allele, TD denote the transmitted disease allele, OTD denote disease allele transmitted by another parent and TH denote the transmitted haplotype. Let P _i be the frequency of the allele M _i at the marker locus, $P_{D_{k}}$ be the frequency of the disease allele D_k, P _ki be the frequency of the haplotype $H_{D_{k}M_{i}}$ and θ be the recombination fraction between the marker and disease loci. Define the measure of LD between the marker and disease loci as

$$\delta_{{kj}} = P_{{kj}} - P_{{{\rm D}_{k}}} P_{j}.$$

Then, the probability that the haplotype ${H_{{{\rm D}_{k} M_{i}}}}$ is transmitted and the marker allele M _j is not transmitted is given by

$$P(\hbox{TH} = \hbox{D}_{k} M_{i}, NM = M_{j}) = (1 - \theta)P_{{ki}} P_{j} + \theta P_{{kj}} P_{i}$$

(12)

The joint probability that a heterozygous parent with genotype M _i M _j transmits the allele M _i to an affected child is given by

$$ \begin{aligned} P(\hbox{TM} & = M_{i}, \hbox{NM} = M_{j} |A) \\ & = \frac{1} {{P(A)}}P(\hbox{TM} = M_{i}, \hbox{NM} = M_{j},A) \\ & = \frac{1} {{P(A)}}{\sum\limits_k {{\sum\limits_l {P(\hbox{TH} = \hbox{D}_{k} M_{i}, \hbox{NM} = M_{j} )P(\hbox{D}_{l} |\hbox{TH} = \hbox{D}_{k} M_{i}, \hbox{NM} = M_{j})P(A|\hbox{D}_{k} \hbox{D}_{l}).} }} } \\ \end{aligned} $$

Using Eq. 12, we have

$$ \begin{aligned} P(\hbox{TM} & = M_{i}, \hbox{NM} = M_{j} |A) \\ & = \frac{1} {{P(A)}}{\sum\limits_k {{\sum\limits_l {f_{{kl}} } }} }P_{{{\rm D}_{l} }} [(1 - \theta)P_{{ki}} P_{j} + \theta P_{{kj}} P_{i} ] \\ & = P_{i} P_{j} + \frac{1} {{P(A)}}[P_{j} {\sum\limits_k {{\sum\limits_l {f_{{kl}} } }} }P_{{{\rm D}_{l} }} \delta _{{ki}} + \theta {\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{{\rm D}_{l} }} } }} }(P_{i} \delta _{{kj}} - P_{j} \delta _{{ki}})] \\ \end{aligned} $$

where notations are given in the text.

Summarizing over all j in the above equation, we obtain the probability of transmitting the marker allele M _i to an affected child:

$$ \begin{aligned} P_{{i.}} & = P(\hbox{TM} = M_{i} |\hbox{Affected}) = P_{i} + \frac{1} {{P(A)}}\left({\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{\hbox{D}l}} \delta _{{ki}} - \theta {\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{\hbox{D}l}} \delta _{{ki}} } }} }} }} }\right) \\ & = P_{i} + \frac{{1 - \theta }} {{P(A)}}[(f_{{11}} - f_{{12}})P_{D} + (f_{{12}} - f_{{22}})P_{d} ]\delta _{{1i}} \\ \end{aligned} $$

Similarly, we have

$$p_{{.j}} = P_{j} + \frac{\theta}{{P(A)}}[(f_{{11}} - f_{{12}})P_{{\rm D}} + (f_{{12}} - f_{{22}})P_{{\rm d}}]\delta_{{1j}}$$

Note that

$$ \begin{aligned} &\delta_{{1j}} + \delta_{{2j}} = P_{{1j}} - P_{{\rm D}} P_{j} + P_{{2j}} - P_{{\rm d}} P_{j} = P_{j} - P_{j} = 0, \min (P_{{\rm D}}, P_{j}) \geqslant P_{{1j}} = \delta_{{1j}} + P_{{\rm D}} P_{j} \geqslant 0,\\ &\min (P_{{\rm d}}, P_{j}) \geqslant P_{{2j}} = - \delta_{{1j}} + (1 - P_{{\rm D}})P_{j} \geqslant 0. \end{aligned} $$

Thus, the measure δ_1j of the LD between the disease allele D and the marker allele M _j should satisfy the following constraints:

$$\min (P_{j} P_{{\rm d}}, (1 - P_{j})P_{{\rm D}}) \geqslant \delta_{{1j}} \geqslant \max (- P_{{\rm D}} P_{j}, - P_{{\rm d}} (1 - P_{j}))$$

(13)

The measure δ_1j of the LD between the disease allele and the marker allele M _j can be calculated by

$$E[\delta_{{1j}} (t)] = \delta_{{1j}} (0)(1 - \theta)^{t}$$

(14)

where t is the time since the generations of the LD between the marker and disease loci, and δ_1j (0) is the measure of the initial LD when the LD was created. The initial measure δ_1j (0) of the LD should satisfy the constraints Eq. 13.

Now we study how to calculate the frequencies of the transmitted and nontransmitted haplotypes. Recall that TH and NH denote the transmitted and nontransmitted haplotypes, respectively. The transmitted three-locus haplotype will experience a non-recombinant, a single recombinant and a double recombinant event (Wilson 1997). Thus, we have

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} si_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} ) \\ & = (1 - \theta _{1})(1 - \theta _{2})P_{{i_{1} si_{2} }} P_{{j_{1} j_{2} }} + \theta _{1} (1 - \theta _{2})P_{{j_{1} si_{2} }} P_{{i_{1} j_{2} }} + (1 - \theta _{1})\theta _{2} P_{{i_{1} sj_{2} }} P_{{j_{1} i_{2} }} + \theta _{1} \theta _{2} P_{{j_{1} sj_{2} }} P_{{i_{1} i_{2} }} \\ \end{aligned} $$

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} ,{\rm Affected}) \\ & = {\sum\limits_s {{\sum\limits_u {f_{{\hbox{su}}} } }} }P_{{\hbox{Du}}} [(1 - \theta _{1})(1 - \theta _{2})P_{{i_{1} si_{2} }} P_{{j_{1} j_{2} }} + \theta _{1} (1 - \theta _{2})P_{{j_{1} si_{2} }} P_{{i_{1} j_{2} }} \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{i_{1} sj_{2} }} P_{{j_{1} i_{2} }} + \theta _{1} \theta _{2} P_{{j_{1} sj_{2} }} P_{{i_{1} i_{2} }} ] \\ \end{aligned} $$

(15)

Let ${b = \frac{{(f_{{11}} - f_{{12}})P_{{\rm D}} + (f_{{12}} - f_{{22}})P_{{\rm d}}}}{{P(A)}}}$ and ${a = \frac{{f_{{12}} P_{{\rm D}} + f_{{22}} P_{{\rm d}}}}{{P(A)}}},$ Then, after some algebra on Eq. 15, we can obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} |\hbox{Affected}) \\ & = (1 - \theta _{1})(1 - \theta _{2})P_{{j_{1} j_{2} }} (bP_{{i_{1} \hbox{D}i_{2} }} + aP_{{i_{1} i_{2} }}) + \theta _{1} (1 - \theta _{2})P_{{i_{1} j_{2} }} (bP_{{j_{1} \hbox{D}i_{2} }} + aP_{{j_{1} i_{2} }}) \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{j_{1} i_{2} }} (bP_{{i_{1} \hbox{D}j_{2} }} + aP_{{i_{1} j_{2} }}) + \theta _{1} \theta _{2} P_{{i_{1} i_{2} }} (bP_{{j_{1} \hbox{D}j_{2} }} + aP_{{j_{1} j_{2} }}) \\ \end{aligned} $$

(16)

Thus, the probability that the haplotype $H_{i_{1}i_{2}}$ is transmitted to an affected child is given by

$$ \begin{aligned} P(TH & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = {\sum\limits_{j_{1} } {{\sum\limits_{j_{2} } {P(\hbox{TH} = H_{{i_{1} i_{2} }} , \hbox{NH} = H_{{j_{1} j_{2} }} |\hbox{Affected})} }} } \\ & = (1 - \theta _{1})(1 - \theta _{2})(bP_{{i_{1} \hbox{D}i_{2} }} + aP_{{i_{1} i_{2} }}) + \theta _{1} (1 - \theta _{2})P_{{i_{1} }} (bP_{{\hbox{D}i_{2} }} + aP_{{i_{2} }}) \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{i_{2} }} (bP_{{i_{1} \hbox{D}_{{}} }} + aP_{{i_{1} }}) + \theta _{1} \theta _{2} P_{{i_{1} i_{2} }} \\ \end{aligned} $$

(17)

(Note bP _D + a = 1)

Similarly, we have

$$\begin{aligned}\,& P(\hbox{NH} = H_{{j_{1} j_{2}}} |\hbox{Affected}) = (1 - \theta_{1})(1 - \theta_{2})P_{{j_{1} j_{2}}} + \theta_{1} (1 - \theta_{2})P_{{j_{2}}} (bP_{{j\hbox{D}}} + aP_{{j_{1}}}) \\& \begin{array}{*{20}c} & \\ \end{array} \begin{array}{*{20}c} & \\ \end{array} + (1 - \theta_{1})\theta_{2} P_{{j_{1}}} (bP_{{\hbox{D}j_{2}}} + aP_{{j_{2}}}) + \theta_{1} \theta_{2} (bP_{{j_{1} \hbox{D}j_{2}}} + aP_{{j_{1} j_{2}}}) \\ \end{aligned}$$

(18)

Using the following relationship between the haplotype and the measure of LD:

$$\begin{aligned} &P_{{i_{1} \hbox{D}i_{2}}} = \delta_{{i_{1} \hbox{D}i_{2}}} + P_{{i_{1}}} \delta_{{\hbox{D}i_{2}}} + P_{{i_{2}}} \delta_{{i_{1} \hbox{D}}} + P_{{i_{1}}} P_{{\rm D}} P_{{i_{2}}},\\ &\,\hbox{and } \delta_{{i_{1} \hbox{D}}} = P_{{i_{1} \hbox{D}}} - P_{{i_{1}}} P_{{\rm D}}, \delta_{{\hbox{D}i_{2}}} = P_{{\hbox{D}i_{2}}} - P_{{\rm D}} P_{{i_{2}}}, \delta_{{i_{1} i_{2}}} = P_{{i_{1} i_{2}}} - P_{{i_{1}}} P_{{i_{2}}}, \end{aligned} $$

We obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = [a + (1 - a)\theta _{1} \theta _{2} ]P_{{i_{1} }} P_{{i_{2} }} + b(1 - \theta _{1})(1 - \theta _{2})\delta _{{i_{1} \hbox{D}i_{2} }} + bP_{{i_{2} }} (1 - \theta _{1})\delta _{{i_{1} \hbox{D}}} \\ & \quad + bP_{{i_{1} }} (1 - \theta _{2})\delta _{{\hbox{D}i_{2} }} + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})a]\delta _{{i_{1} i_{2} }} \\ \end{aligned} $$

$$ \begin{aligned} P(\hbox{NH} & = H_{{j_{1} j_{2} }} |\hbox{Affected}) = [a\theta _{2} + (1 - \theta _{2})(1 - \theta _{1} + a\theta _{1})]P_{{j_{1} }} P_{{j_{2} }} + b\theta _{1} \theta _{2} \delta _{{j_{1} \hbox{D}j_{2} }} + bP_{{j_{2} }} \theta _{1} \delta _{{j_{1} \hbox{D}}} \\ & \quad + bP_{{j_{1} }} \theta _{2} \delta _{{\hbox{D}j_{2} }} + [a\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})]\delta _{{j_{1} j_{2} }} \\ \end{aligned} $$

(19)

Measures of LD are random variables. The expectation of $\delta_{i_{1}si_{2}}$ is equal to

$$E[\delta_{{i_{1} si_{2}}}] = E[P_{{i_{1} si_{2}}}] - P_{{i_{1}}} E[\delta_{{si_{2}}}] - P_{{i_{2}}} E[\delta_{{i_{1} s}}] - P_{{i_{1}}} P_{{{\rm D}_{s}}} P_{{i_{2}}}.$$

But, it was shown that ${E[P_{{i_{1} si_{2}}}] = \delta_{{i_{1} si_{2}}} (0)(1 - \theta_{1})^{t} (1 - \theta_{2})^{t} + P_{{i_{1}}} E[\delta_{{si_{2}}}] + P_{{i_{2}}} E[\delta_{{i_{1} s}}] + P_{{i_{1}}} P_{{{\rm D}_{s}}} P_{{i_{2}}}},$ where $\delta_{i_{1}si_{2}} (0)$ is the measure of the initial LD at three loci $M_{i_{1}}D_{s}M_{i_{2}}.$ Thus, we have

$$E[\delta_{{i_{1} si_{2}}}] = \delta_{{i_{1} si_{2}}} (0)(1 - \theta_{1})^{t} (1 - \theta_{2})^{t}$$

(20)

Substituting ${E\left[\delta_{i_{1}s}\right]},\;{E\left[\delta_{si_{2}}\right]}$ in Eq. 14 and ${E\left[\delta_{i_{1}si_{2}}\right]}$ in Eq. 20 into Eq. 19, we obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = [a + (1 - a)\theta _{1} \theta _{2} ]P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)(1 - \theta _{1})^{{t + 1}} (1 - \theta _{2})^{{t + 1}} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)(1 - \theta _{1})^{{t + 1}} \\ & \quad + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (0)(1 - \theta _{2})^{{t + 1}} + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2} )a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

$$ \begin{aligned} P(\hbox{NH} & = H_{{j_{1} j_{2} }} |\hbox{Affected}) = [a\theta _{2} + (1 - \theta _{2})(1 - \theta _{1} + a\theta _{1})]P_{{j_{1} }} P_{{j_{2} }} + b\delta _{{j_{1} \hbox{D}j_{2} }} (0)(1 - \theta _{1})^{t} (1 - \theta _{2})^{t} + bP_{{j_{2} }} \delta _{{j_{1} \hbox{D}}} (0)(1 - \theta _{1})^{t} \\ & \quad + bP_{{j_{1} }} \delta _{{\hbox{D}j_{2} }} (0)\theta _{2} (1 - \theta _{2})^{t} + [\theta _{1} \theta _{2} a + (1 - \theta _{1} )(1 - \theta _{2})]\delta _{{j_{1} j_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

Let ${P_{{i_{1} i_{2} \cdot}} = P(\hbox{TH} = H_{{i_{1} i_{2}}} |\hbox{Affected})}$ and ${P_{{\cdot i_{1} i_{2}}} = P(\hbox{NH} = H_{{i_{1} i_{2}}} |\hbox{Affected}),}$ we obtain equations:

$$ \begin{aligned} P_{{i_{1} i_{2} \cdot }} & = P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)(1 - \theta _{1})^{{t + 1}} (1 - \theta _{2})^{{t + 1}} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)(1 - \theta _{1})^{{t + 1}} + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (1 - \theta _{2})^{{t + 1}} \\ & \quad + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

$$ \begin{aligned} P_{{ \cdot i_{1} i_{2} }} & = P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)\theta _{1} \theta _{2} (1 - \theta _{1})^{t} (1 - \theta _{2})^{t} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)\theta _{1} (1 - \theta _{1})^{t} + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (0)\theta _{2} (1 - \theta _{2})^{t} \\ & +\quad [(1 - \theta _{1} )(1 - \theta _{2}) + \theta _{1} \theta _{2} a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

Based on the above equations, we can calculate

$$\mu_{T} = [- P_{{11 \cdot}} \log P_{{11 \cdot}}, - P_{{12 \cdot}} \log P_{{12 \cdot}}, - P_{{21 \cdot}} \log P_{{21 \cdot}}, - P_{{22 \cdot}} \log P_{{22 \cdot}}]^{T}$$

$$\mu_{{NT}} = [- P_{{\cdot 11}} \log P_{{\cdot 11}}, - P_{{\cdot 12}} \log P_{{\cdot 12}}, - P_{{\cdot 21}} \log P_{{\cdot 21}}, - P_{{\cdot 22}} \log P_{{\cdot 22}}]^{\hbox{T}}.$$

The matrices Σ_p, Σ_q, B, C, and Λ can be similarly defined. Then, for haplotypes produced by two SNPs marker loci flanking a disease locus, substituting μ_T, μ_NT and other parameters, Σ_p, Σ_q, B, C, and Λ into Eq. 9, we obtain the noncentrality parameter λ_HE. Using these analytic formulas for computing the noncentrality parameters of the distribution of the test statistics, we can calculate the power of the test statistics under specified alternative hypothesis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Boerwinkle, E. & Xiong, M. An entropy-based genome-wide transmission/disequilibrium test. Hum Genet 121, 357–367 (2007). https://doi.org/10.1007/s00439-007-0322-6

Download citation

Received: 17 September 2006
Accepted: 02 January 2007
Published: 13 February 2007
Issue Date: May 2007
DOI: https://doi.org/10.1007/s00439-007-0322-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An entropy-based genome-wide transmission/disequilibrium test

Abstract

Access this article

Similar content being viewed by others

Burden of Mendelian disorders in a large Middle Eastern biobank

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An entropy-based genome-wide transmission/disequilibrium test

Abstract

Access this article

Similar content being viewed by others

Burden of Mendelian disorders in a large Middle Eastern biobank

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation