Skip to main content
Log in

An entropy-based genome-wide transmission/disequilibrium test

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Availability of a large collection of single nucleotide polymorphisms (SNPs) and efficient genotyping methods enable the extension of linkage and association studies for complex diseases from small genomic regions to the whole genome. Establishing global significance for linkage or association requires small P-values of the test. The original TDT statistic compares the difference in linear functions of the number of transmitted and nontransmitted alleles or haplotypes. In this report, we introduce a novel TDT statistic, which uses Shannon entropy as a nonlinear transformation of the frequencies of the transmitted or nontransmitted alleles (or haplotypes), to amplify the difference in the number of transmitted and nontransmitted alleles or haplotypes in order to increase statistical power with large number of marker loci. The null distribution of the entropy-based TDT statistic and the type I error rates in both homogeneous and admixture populations are validated using a series of simulation studies. By analytical methods, we show that the power of the entropy-based TDT statistic is higher than the original TDT, and this difference increases with the number of marker loci. Finally, the new entropy-based TDT statistic is applied to two real data sets to test the association of the RET gene with Hirschsprung disease and the Fcγ receptor genes with systemic lupus erythematosus. Results show that the entropy-based TDT statistic can reach p-values that are small enough to establish genome-wide linkage or association analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Borrego S, Ruiz A, Saez ME, Gimm O, Gao X, Lopez-Alonso M, Hernandez A, Wright FA, Antinolo G, Eng C (2000) RET genotypes comprising specific haplotypes of polymorphic variants predispose to isolated Hirschsprung disease. J Med Genet 37:572–578

    Article  PubMed  CAS  Google Scholar 

  • Bourgain C, Genin E, Margaritte-Jeannin P, Clerget-Darpoux F (2001) Maximum identity length contrast: a powerful method for susceptibility gene detection in isolated populations. Genet Epidemiol 21(Suppl 1):S560–S564

    PubMed  Google Scholar 

  • Clayton D, Jones H (1999) Transmission/disequilibrium tests for extended marker haplotypes. Am J Hum Genet 65:1161–1169

    Article  PubMed  CAS  Google Scholar 

  • Edberg JC, Langefeld CD, Wu J, Moser KL, Kaufman KM, Kelly J, Bansal V, Brown WM, Salmon JE, Rich SS, Harley JB, Kimberly RP (2002) Genetic linkage and association of Fcgamma receptor IIIA (CD16A) on chromosome 1q23 with human systemic lupus erythematosus. Arthritis Rheum 46:2132–2140

    Article  PubMed  CAS  Google Scholar 

  • Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–464

    PubMed  CAS  Google Scholar 

  • Freimer N, Sabatti C (2004) The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nat Genet 36:1045–1051

    Article  PubMed  CAS  Google Scholar 

  • Graybill FA (1976) Theory and application of the linear model. Duxbury Press, North Scituate

    Google Scholar 

  • Hampe J, Schreiber S, Krawczak M (2003) Entropy-based SNP selection for genetic association studies. Hum Genet 114:36–43

    Article  PubMed  CAS  Google Scholar 

  • Lehmann EL (1983) Theory of point estimation. Wiley, New York

    Google Scholar 

  • Nothnagel M (2002) Simulation of LD block-structured SNP haplotype data and its use for the analysis of case-control data by supervised learning methods. Am J Hum Genet 71(Suppl 4): A2363

    Google Scholar 

  • Rabinowitz D, Laird N (2000) A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 50:211–223

    Article  PubMed  CAS  Google Scholar 

  • Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

    Article  PubMed  CAS  Google Scholar 

  • Schaid DJ (1996) General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 13:423–449

    Article  PubMed  CAS  Google Scholar 

  • Sham PC (1997) Transmission/disequilibrium tests for multiallelic loci. Am J Hum Genet 61:774–778

    PubMed  CAS  Google Scholar 

  • Sham PC, Curtis D (1995a) An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 59:323–336

    PubMed  CAS  Google Scholar 

  • Sham PC, Curtis D (1995b) An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 59(Pt 3):323–336

    PubMed  CAS  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Systems Tech J 27:379–423

    Google Scholar 

  • Spielman RS, Ewens WJ (1996) The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 59:983–989

    PubMed  CAS  Google Scholar 

  • Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516

    PubMed  CAS  Google Scholar 

  • Wilson SR (1997) On extending the transmission/disequilibrium test (TDT). Ann Hum Genet 61(Pt 2):151–161

    Article  PubMed  CAS  Google Scholar 

  • Zhang S, Sha Q, Chen HS, Dong J, Jiang R (2003) Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet 73:566–579

    Article  PubMed  CAS  Google Scholar 

  • Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd KK (2000) Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet 67:936–946

    Article  PubMed  CAS  Google Scholar 

  • Zhao J, Boerwinkle E, Xiong M (2005) An entropy-based statistic for genomewide association studies. Am J Hum Genet 77:27–40

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

M. M. Xiong is supported by NIH-NIAMS grant IP50AR44888, HL74735, and NIH grant ES09912. J. Y. Zhao is supported by NIH grant ES09912. E. Boerwinkle is supported by grant from the National Heart, Lung and Blood Institute and the National Institute of General Medical Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Momiao Xiong.

Appendices

Appendix 1

Let \({X = - \hat{p}\log \hat{p}}\) and \({Y = - \hat{q}\log \hat{q}}.\) Both X and Y are nonlinear functions of the allele frequencies. The distribution of (XY) is asymptotically normal with mean zero and variance Var(XY), where

$$ \begin{aligned} \hbox{Var}(X - Y) & = [{X}\ifmmode{'}\else$'$\fi(p)]^{2} \frac{{pq}} {n} + [{Y}\ifmmode{'}\else$'$\fi(q)]^{2} \frac{{pq}} {n} + 2{X}\ifmmode{'}\else$'$\fi(p){Y}\ifmmode{'}\else$'$\fi(q)\frac{{pq}} {n} \\ & = \frac{{pq}} {n}[{X}\ifmmode{'}\else$'$\fi(p) + {Y}\ifmmode{'}\else$'$\fi(q)]^{2} \\ & = \frac{{pq}} {n}(2 + \log p + \log q)^{2}. \\ \end{aligned} $$

Under the null hypothesis of no linkage or no association, the frequency of the transmitted allele M 1 is equal to the frequency of the transmitted allele M 2, thus, we have X(p) = Y(q). Therefore, the distribution of (XY) is normal distribution with mean zero and variance \({\frac{{pq}}{n}(2 + \log p + \log q)^{2}}.\) Under the null hypothesis, \({\hbox{TDT}_{e} = \frac{{(X - Y)^{2}}}{{{\rm Var}(X - Y)}} = \frac{{n(\hat{p}\log \hat{p} - \hat{q}\log \hat{q})^{2}}}{{\hat{p}\hat{q}(2 + \log \hat{p} + \log \hat{q})^{2}}}}\) is asymptotically distributed as a central χ2 (1) distribution.

Appendix 2

First, we calculate \({\hbox{var} (\hat{p}_{{i \cdot}})}.\) By definition, we have

$$ \begin{aligned} \operatorname{var} (\hat p_{i \cdot }) &= \operatorname{var} \left(\frac{1} {n}\sum\limits_{j = 1}^k {n_{ij} } \right) \\ &= \frac{1} {{n^2 }}\left\{ {n\sum\limits_{j = 1}^k {p_{ij} (1 - p_{ij}) + (- n)\sum\limits_{\begin{array}{*{20}c} {j = 1} \\ {j \ne l} \\ \end{array} }^k {\sum\limits_{l = 1}^k {p_{ij} p_{il} } } } } \right\} & \\ &= \frac{1} {n}p_{i \cdot } (1 - p_{i \cdot }) & \\ \end{aligned} $$

where p ii = 0.

Similarly, we have

$$\hbox{var} (\hat{q}_{{i.}}) = \frac{1}{n}q_{{i.}} (1 - q_{{i.}}).$$

Next we calculate covariance between \({\hat{p}_{{i \cdot}}}\) and \({\hat{p}_{{j \cdot}}(i \ne j}).\) Again, by definition, we obtain

$$ \begin{aligned} \operatorname{cov} (\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{i \cdot }},\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{j \cdot }}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{il}} } },\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{jm}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{l = 1}^k {{\sum\limits_{m = 1}^k {p_{{il}} p_{{jm}} } }} } \\ & = - \frac{1} {n}p_{{i \cdot }} p_{{j \cdot }} \\ \end{aligned} $$

Similarly, we have

$$\hbox{cov} (\hat{q}_{{i.}}, \hat{q}_{{j.}}) = - \frac{1}{n}q_{{i.}} q_{{j.}}$$

Now we calculate \({\hbox{cov} (\hat{p}_{{i \cdot}}, \hat{q}_{{j.}}).}\) First, we consider ij. In these cases, we have

$$ \begin{aligned} \operatorname{cov} (\ifmmode\expandafter\hat\else\expandafter\^\fi{p}_{{i \cdot }} ,\ifmmode\expandafter\hat\else\expandafter\^\fi{q}_{{j.}}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{il}} } },\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{mj}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{\begin{array}{*{20}c} {{l = 1}} \\ {{l \ne j}} \\ \end{array} }^k {{\sum\limits_{\begin{array}{*{20}c} {{m = 1}} \\ {{m \ne i}} \\ \end{array} }^k {p_{{il}} p_{{mj}} } }} } + \frac{1} {n}p_{{ij}} (1 - p_{{ij}}) \\ & = \frac{1} {n}(p_{{ij}} - p_{{i \cdot }} q_{{j.}}) \\ \end{aligned} $$

Similarly, we have \({\hbox{cov} (\hat{q}_{{i.}}, \hat{p}_{{j \cdot}}) = \frac{1}{n}(p_{{ji}} - q_{{i.}} p_{{j \cdot}})}\) when ij.

Then, we consider ij. For ij, we obtain

$$ \begin{aligned} \hbox{cov} (\hat{p}_{{i \cdot }},\hat{q}_{{j.}}) & = \operatorname{cov} \left(\frac{1} {n}{\sum\limits_{m = 1}^k {n_{{im}} } },\frac{1} {n}{\sum\limits_{l = 1}^k {n_{{li}} } }\right) \\ & = - \frac{1} {n}{\sum\limits_{m = 1}^k {{\sum\limits_{l = 1}^k {p_{{im}} p_{{li}} } }} } + \frac{1} {n}p_{{ii}} \\ & = - \frac{1} {n}p_{{i \cdot }} q_{{j.}} \\ \end{aligned} $$

Thus, we have proven Eq. 6.

Let \({h(p) = [h(p_{{1 \cdot}}), \ldots, h(p_{{k \cdot}})]^{T}}\) and \({h(q) = [h(q_{{1 \cdot}}), \ldots, h(q_{{k \cdot}})]^{\rm {T}}},\) where \({h(p_{{i \cdot}}) = - p_{{i \cdot}} \log p_{{i \cdot}}}\) and \({h(q_{{i.}}) = - q_{{i.}} \log q_{{i.}}}.\) Then \({h(\hat{p}) - h(p)}\) and \({h(\hat{q}) - h(q)}\) are asymptotically distributed as normal distribution with mean zero and variance \({\frac{1}{n}B\Sigma_{p} B^{\rm {T}}}\) and \({\frac{1}{n}C\Sigma_{q} C^{\rm {T}}},\) respectively, where \({B = (b_{{ij}})_{{k \times k}}}\) and \({C = (c_{{ij}})_{{k \times k}}, b_{{ii}} = \frac{{\partial h(p_{{i \cdot}})}}{{\partial p_{{i \cdot}}}} = - 1 - \log p_{{i \cdot}}, b_{{ij}} = \frac{{\partial h(p_{{i \cdot}})}}{{\partial p_{{j \cdot}}}} = 0\begin{array}{*{20}c} & {{(j \ne i)}}, \\ \end{array} c_{{ii}} = \frac{{\partial h(q_{{i.}})}}{{\partial q_{{i.}}}} = - 1 - \log q_{{i.}}}\) and \({c_{{ij}} = \frac{{\partial h(q_{{i.}})}}{{\partial q_{{j.}}}} = 0\begin{array}{*{20}c} & {{(j \ne i)}} \\ \end{array}}\) Under the null hypothesis of no linkage or no association, we have h(p) = h(q), thus \({h(\hat{p}) - h(\hat{q}) = h(\hat{p}) - h(p) - [h(\hat{q}) - h(q)]}\) is asymptotically distributed as normal distribution:

$$h(\hat{p}) - h(\hat{q}) = h(\hat{p}) - h(p) - [h(\hat{q}) - h(q)]\sim N\left(0,\frac{1}{n}\Lambda\right),$$

where

$$\Lambda = B\Sigma_{p} B^{\rm {T}} + C\Sigma_{q} C^{\rm {T}} - B\Sigma_{{pq}} C^{\rm {T}} - C\Sigma^{\rm {T}}_{{pq}} B^{\rm {T}}.$$
(11)

Applying Theorem 4.4.3 (Graybill 1976), we obtain \({n[h(\hat{p}) - h(\hat{q})]^{T} \Lambda^{-}_{e} [h(\hat{p}) - h(\hat{q})]}\) is asymptotically distributed as a central χ2 (r) distribution under the null hypothesis of no linkage or no association, where r = rank(Λ e ), and Λ e is the estimator of matrix Λ by substituting the estimators of p i· and p ·i into Eq. 11.

Appendix 3

Following the approach of Sham and Curtis, we can obtain the joint probability that a heterozygous parent with genotype M i M j transmits the allele M i to an affected child. Let TM denote the transmitted marker allele, NM denote the nontransmitted marker allele, TD denote the transmitted disease allele, OTD denote disease allele transmitted by another parent and TH denote the transmitted haplotype. Let P i be the frequency of the allele M i at the marker locus, \(P_{D_{k}}\) be the frequency of the disease allele D k , P ki be the frequency of the haplotype \(H_{D_{k}M_{i}}\) and θ be the recombination fraction between the marker and disease loci. Define the measure of LD between the marker and disease loci as

$$\delta_{{kj}} = P_{{kj}} - P_{{{\rm D}_{k}}} P_{j}.$$

Then, the probability that the haplotype \({H_{{{\rm D}_{k} M_{i}}}}\) is transmitted and the marker allele M j is not transmitted is given by

$$P(\hbox{TH} = \hbox{D}_{k} M_{i}, NM = M_{j}) = (1 - \theta)P_{{ki}} P_{j} + \theta P_{{kj}} P_{i}$$
(12)

The joint probability that a heterozygous parent with genotype M i M j transmits the allele M i to an affected child is given by

$$ \begin{aligned} P(\hbox{TM} & = M_{i}, \hbox{NM} = M_{j} |A) \\ & = \frac{1} {{P(A)}}P(\hbox{TM} = M_{i}, \hbox{NM} = M_{j},A) \\ & = \frac{1} {{P(A)}}{\sum\limits_k {{\sum\limits_l {P(\hbox{TH} = \hbox{D}_{k} M_{i}, \hbox{NM} = M_{j} )P(\hbox{D}_{l} |\hbox{TH} = \hbox{D}_{k} M_{i}, \hbox{NM} = M_{j})P(A|\hbox{D}_{k} \hbox{D}_{l}).} }} } \\ \end{aligned} $$

Using Eq. 12, we have

$$ \begin{aligned} P(\hbox{TM} & = M_{i}, \hbox{NM} = M_{j} |A) \\ & = \frac{1} {{P(A)}}{\sum\limits_k {{\sum\limits_l {f_{{kl}} } }} }P_{{{\rm D}_{l} }} [(1 - \theta)P_{{ki}} P_{j} + \theta P_{{kj}} P_{i} ] \\ & = P_{i} P_{j} + \frac{1} {{P(A)}}[P_{j} {\sum\limits_k {{\sum\limits_l {f_{{kl}} } }} }P_{{{\rm D}_{l} }} \delta _{{ki}} + \theta {\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{{\rm D}_{l} }} } }} }(P_{i} \delta _{{kj}} - P_{j} \delta _{{ki}})] \\ \end{aligned} $$

where notations are given in the text.

Summarizing over all j in the above equation, we obtain the probability of transmitting the marker allele M i to an affected child:

$$ \begin{aligned} P_{{i.}} & = P(\hbox{TM} = M_{i} |\hbox{Affected}) = P_{i} + \frac{1} {{P(A)}}\left({\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{\hbox{D}l}} \delta _{{ki}} - \theta {\sum\limits_k {{\sum\limits_l {f_{{kl}} P_{{\hbox{D}l}} \delta _{{ki}} } }} }} }} }\right) \\ & = P_{i} + \frac{{1 - \theta }} {{P(A)}}[(f_{{11}} - f_{{12}})P_{D} + (f_{{12}} - f_{{22}})P_{d} ]\delta _{{1i}} \\ \end{aligned} $$

Similarly, we have

$$p_{{.j}} = P_{j} + \frac{\theta}{{P(A)}}[(f_{{11}} - f_{{12}})P_{{\rm D}} + (f_{{12}} - f_{{22}})P_{{\rm d}}]\delta_{{1j}}$$

Note that

$$ \begin{aligned} &\delta_{{1j}} + \delta_{{2j}} = P_{{1j}} - P_{{\rm D}} P_{j} + P_{{2j}} - P_{{\rm d}} P_{j} = P_{j} - P_{j} = 0, \min (P_{{\rm D}}, P_{j}) \geqslant P_{{1j}} = \delta_{{1j}} + P_{{\rm D}} P_{j} \geqslant 0,\\ &\min (P_{{\rm d}}, P_{j}) \geqslant P_{{2j}} = - \delta_{{1j}} + (1 - P_{{\rm D}})P_{j} \geqslant 0. \end{aligned} $$

Thus, the measure δ1j of the LD between the disease allele D and the marker allele M j should satisfy the following constraints:

$$\min (P_{j} P_{{\rm d}}, (1 - P_{j})P_{{\rm D}}) \geqslant \delta_{{1j}} \geqslant \max (- P_{{\rm D}} P_{j}, - P_{{\rm d}} (1 - P_{j}))$$
(13)

The measure δ1j of the LD between the disease allele and the marker allele M j can be calculated by

$$E[\delta_{{1j}} (t)] = \delta_{{1j}} (0)(1 - \theta)^{t}$$
(14)

where t is the time since the generations of the LD between the marker and disease loci, and δ1j (0) is the measure of the initial LD when the LD was created. The initial measure δ1j (0) of the LD should satisfy the constraints Eq. 13.

Now we study how to calculate the frequencies of the transmitted and nontransmitted haplotypes. Recall that TH and NH denote the transmitted and nontransmitted haplotypes, respectively. The transmitted three-locus haplotype will experience a non-recombinant, a single recombinant and a double recombinant event (Wilson 1997). Thus, we have

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} si_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} ) \\ & = (1 - \theta _{1})(1 - \theta _{2})P_{{i_{1} si_{2} }} P_{{j_{1} j_{2} }} + \theta _{1} (1 - \theta _{2})P_{{j_{1} si_{2} }} P_{{i_{1} j_{2} }} + (1 - \theta _{1})\theta _{2} P_{{i_{1} sj_{2} }} P_{{j_{1} i_{2} }} + \theta _{1} \theta _{2} P_{{j_{1} sj_{2} }} P_{{i_{1} i_{2} }} \\ \end{aligned} $$
$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} ,{\rm Affected}) \\ & = {\sum\limits_s {{\sum\limits_u {f_{{\hbox{su}}} } }} }P_{{\hbox{Du}}} [(1 - \theta _{1})(1 - \theta _{2})P_{{i_{1} si_{2} }} P_{{j_{1} j_{2} }} + \theta _{1} (1 - \theta _{2})P_{{j_{1} si_{2} }} P_{{i_{1} j_{2} }} \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{i_{1} sj_{2} }} P_{{j_{1} i_{2} }} + \theta _{1} \theta _{2} P_{{j_{1} sj_{2} }} P_{{i_{1} i_{2} }} ] \\ \end{aligned} $$
(15)

Let \({b = \frac{{(f_{{11}} - f_{{12}})P_{{\rm D}} + (f_{{12}} - f_{{22}})P_{{\rm d}}}}{{P(A)}}}\) and \({a = \frac{{f_{{12}} P_{{\rm D}} + f_{{22}} P_{{\rm d}}}}{{P(A)}}},\) Then, after some algebra on Eq. 15, we can obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }}, \hbox{NH} = H_{{j_{1} j_{2} }} |\hbox{Affected}) \\ & = (1 - \theta _{1})(1 - \theta _{2})P_{{j_{1} j_{2} }} (bP_{{i_{1} \hbox{D}i_{2} }} + aP_{{i_{1} i_{2} }}) + \theta _{1} (1 - \theta _{2})P_{{i_{1} j_{2} }} (bP_{{j_{1} \hbox{D}i_{2} }} + aP_{{j_{1} i_{2} }}) \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{j_{1} i_{2} }} (bP_{{i_{1} \hbox{D}j_{2} }} + aP_{{i_{1} j_{2} }}) + \theta _{1} \theta _{2} P_{{i_{1} i_{2} }} (bP_{{j_{1} \hbox{D}j_{2} }} + aP_{{j_{1} j_{2} }}) \\ \end{aligned} $$
(16)

Thus, the probability that the haplotype \(H_{i_{1}i_{2}}\) is transmitted to an affected child is given by

$$ \begin{aligned} P(TH & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = {\sum\limits_{j_{1} } {{\sum\limits_{j_{2} } {P(\hbox{TH} = H_{{i_{1} i_{2} }} , \hbox{NH} = H_{{j_{1} j_{2} }} |\hbox{Affected})} }} } \\ & = (1 - \theta _{1})(1 - \theta _{2})(bP_{{i_{1} \hbox{D}i_{2} }} + aP_{{i_{1} i_{2} }}) + \theta _{1} (1 - \theta _{2})P_{{i_{1} }} (bP_{{\hbox{D}i_{2} }} + aP_{{i_{2} }}) \\ & \quad + (1 - \theta _{1})\theta _{2} P_{{i_{2} }} (bP_{{i_{1} \hbox{D}_{{}} }} + aP_{{i_{1} }}) + \theta _{1} \theta _{2} P_{{i_{1} i_{2} }} \\ \end{aligned} $$
(17)

(Note bP D a = 1)

Similarly, we have

$$\begin{aligned}\,& P(\hbox{NH} = H_{{j_{1} j_{2}}} |\hbox{Affected}) = (1 - \theta_{1})(1 - \theta_{2})P_{{j_{1} j_{2}}} + \theta_{1} (1 - \theta_{2})P_{{j_{2}}} (bP_{{j\hbox{D}}} + aP_{{j_{1}}}) \\& \begin{array}{*{20}c} & \\ \end{array} \begin{array}{*{20}c} & \\ \end{array} + (1 - \theta_{1})\theta_{2} P_{{j_{1}}} (bP_{{\hbox{D}j_{2}}} + aP_{{j_{2}}}) + \theta_{1} \theta_{2} (bP_{{j_{1} \hbox{D}j_{2}}} + aP_{{j_{1} j_{2}}}) \\ \end{aligned}$$
(18)

Using the following relationship between the haplotype and the measure of LD:

$$\begin{aligned} &P_{{i_{1} \hbox{D}i_{2}}} = \delta_{{i_{1} \hbox{D}i_{2}}} + P_{{i_{1}}} \delta_{{\hbox{D}i_{2}}} + P_{{i_{2}}} \delta_{{i_{1} \hbox{D}}} + P_{{i_{1}}} P_{{\rm D}} P_{{i_{2}}},\\ &\,\hbox{and } \delta_{{i_{1} \hbox{D}}} = P_{{i_{1} \hbox{D}}} - P_{{i_{1}}} P_{{\rm D}}, \delta_{{\hbox{D}i_{2}}} = P_{{\hbox{D}i_{2}}} - P_{{\rm D}} P_{{i_{2}}}, \delta_{{i_{1} i_{2}}} = P_{{i_{1} i_{2}}} - P_{{i_{1}}} P_{{i_{2}}}, \end{aligned} $$

We obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = [a + (1 - a)\theta _{1} \theta _{2} ]P_{{i_{1} }} P_{{i_{2} }} + b(1 - \theta _{1})(1 - \theta _{2})\delta _{{i_{1} \hbox{D}i_{2} }} + bP_{{i_{2} }} (1 - \theta _{1})\delta _{{i_{1} \hbox{D}}} \\ & \quad + bP_{{i_{1} }} (1 - \theta _{2})\delta _{{\hbox{D}i_{2} }} + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})a]\delta _{{i_{1} i_{2} }} \\ \end{aligned} $$
$$ \begin{aligned} P(\hbox{NH} & = H_{{j_{1} j_{2} }} |\hbox{Affected}) = [a\theta _{2} + (1 - \theta _{2})(1 - \theta _{1} + a\theta _{1})]P_{{j_{1} }} P_{{j_{2} }} + b\theta _{1} \theta _{2} \delta _{{j_{1} \hbox{D}j_{2} }} + bP_{{j_{2} }} \theta _{1} \delta _{{j_{1} \hbox{D}}} \\ & \quad + bP_{{j_{1} }} \theta _{2} \delta _{{\hbox{D}j_{2} }} + [a\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})]\delta _{{j_{1} j_{2} }} \\ \end{aligned} $$
(19)

Measures of LD are random variables. The expectation of \(\delta_{i_{1}si_{2}}\) is equal to

$$E[\delta_{{i_{1} si_{2}}}] = E[P_{{i_{1} si_{2}}}] - P_{{i_{1}}} E[\delta_{{si_{2}}}] - P_{{i_{2}}} E[\delta_{{i_{1} s}}] - P_{{i_{1}}} P_{{{\rm D}_{s}}} P_{{i_{2}}}.$$

But, it was shown that \({E[P_{{i_{1} si_{2}}}] = \delta_{{i_{1} si_{2}}} (0)(1 - \theta_{1})^{t} (1 - \theta_{2})^{t} + P_{{i_{1}}} E[\delta_{{si_{2}}}] + P_{{i_{2}}} E[\delta_{{i_{1} s}}] + P_{{i_{1}}} P_{{{\rm D}_{s}}} P_{{i_{2}}}},\) where \(\delta_{i_{1}si_{2}} (0)\) is the measure of the initial LD at three loci \(M_{i_{1}}D_{s}M_{i_{2}}.\) Thus, we have

$$E[\delta_{{i_{1} si_{2}}}] = \delta_{{i_{1} si_{2}}} (0)(1 - \theta_{1})^{t} (1 - \theta_{2})^{t}$$
(20)

Substituting \({E\left[\delta_{i_{1}s}\right]},\;{E\left[\delta_{si_{2}}\right]}\) in Eq. 14 and \({E\left[\delta_{i_{1}si_{2}}\right]}\) in Eq. 20 into Eq. 19, we obtain

$$ \begin{aligned} P(\hbox{TH} & = H_{{i_{1} i_{2} }} |\hbox{Affected}) = [a + (1 - a)\theta _{1} \theta _{2} ]P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)(1 - \theta _{1})^{{t + 1}} (1 - \theta _{2})^{{t + 1}} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)(1 - \theta _{1})^{{t + 1}} \\ & \quad + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (0)(1 - \theta _{2})^{{t + 1}} + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2} )a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$
$$ \begin{aligned} P(\hbox{NH} & = H_{{j_{1} j_{2} }} |\hbox{Affected}) = [a\theta _{2} + (1 - \theta _{2})(1 - \theta _{1} + a\theta _{1})]P_{{j_{1} }} P_{{j_{2} }} + b\delta _{{j_{1} \hbox{D}j_{2} }} (0)(1 - \theta _{1})^{t} (1 - \theta _{2})^{t} + bP_{{j_{2} }} \delta _{{j_{1} \hbox{D}}} (0)(1 - \theta _{1})^{t} \\ & \quad + bP_{{j_{1} }} \delta _{{\hbox{D}j_{2} }} (0)\theta _{2} (1 - \theta _{2})^{t} + [\theta _{1} \theta _{2} a + (1 - \theta _{1} )(1 - \theta _{2})]\delta _{{j_{1} j_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

Let \({P_{{i_{1} i_{2} \cdot}} = P(\hbox{TH} = H_{{i_{1} i_{2}}} |\hbox{Affected})}\) and \({P_{{\cdot i_{1} i_{2}}} = P(\hbox{NH} = H_{{i_{1} i_{2}}} |\hbox{Affected}),}\) we obtain equations:

$$ \begin{aligned} P_{{i_{1} i_{2} \cdot }} & = P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)(1 - \theta _{1})^{{t + 1}} (1 - \theta _{2})^{{t + 1}} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)(1 - \theta _{1})^{{t + 1}} + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (1 - \theta _{2})^{{t + 1}} \\ & \quad + [\theta _{1} \theta _{2} + (1 - \theta _{1})(1 - \theta _{2})a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$
$$ \begin{aligned} P_{{ \cdot i_{1} i_{2} }} & = P_{{i_{1} }} P_{{i_{2} }} + b\delta _{{i_{1} \hbox{D}i_{2} }} (0)\theta _{1} \theta _{2} (1 - \theta _{1})^{t} (1 - \theta _{2})^{t} + bP_{{i_{2} }} \delta _{{i_{1} \hbox{D}}} (0)\theta _{1} (1 - \theta _{1})^{t} + bP_{{i_{1} }} \delta _{{\hbox{D}i_{2} }} (0)\theta _{2} (1 - \theta _{2})^{t} \\ & +\quad [(1 - \theta _{1} )(1 - \theta _{2}) + \theta _{1} \theta _{2} a]\delta _{{i_{1} i_{2} }} (0)(1 - \theta _{1} - \theta _{2})^{t} \\ \end{aligned} $$

Based on the above equations, we can calculate

$$\mu_{T} = [- P_{{11 \cdot}} \log P_{{11 \cdot}}, - P_{{12 \cdot}} \log P_{{12 \cdot}}, - P_{{21 \cdot}} \log P_{{21 \cdot}}, - P_{{22 \cdot}} \log P_{{22 \cdot}}]^{T}$$
$$\mu_{{NT}} = [- P_{{\cdot 11}} \log P_{{\cdot 11}}, - P_{{\cdot 12}} \log P_{{\cdot 12}}, - P_{{\cdot 21}} \log P_{{\cdot 21}}, - P_{{\cdot 22}} \log P_{{\cdot 22}}]^{\hbox{T}}.$$

The matrices Σ p , Σ q , B, C, and Λ can be similarly defined. Then, for haplotypes produced by two SNPs marker loci flanking a disease locus, substituting μ T , μ NT and other parameters, Σ p , Σ q , B, C, and Λ into Eq. 9, we obtain the noncentrality parameter λ HE . Using these analytic formulas for computing the noncentrality parameters of the distribution of the test statistics, we can calculate the power of the test statistics under specified alternative hypothesis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Boerwinkle, E. & Xiong, M. An entropy-based genome-wide transmission/disequilibrium test. Hum Genet 121, 357–367 (2007). https://doi.org/10.1007/s00439-007-0322-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-007-0322-6

Keywords

Navigation