Introduction

Human malaria is caused by four Plasmodium parasites, P. falciparum, P. vivax, P. ovale, and P. malariae, although outbreaks of the monkey malaria, P. knowlesi, infections in human populations have been reported (Singh et al. 2004). Among the human parasites, P. vivax is the most widespread species, and P. falciparum is the most deadly, killing 1–2 million people each year in tropical and subtropical regions, particularly in Africa, where P. falciparum malaria inflicts a heavy burden on human health and prosperity (Cox et al. 2007; Snow et al. 2005).

Parasite life cycle and genome

The malaria parasite has a complex life cycle involving a vertebrate host and mosquito vector. The life cycle of P. falciparum consists of multiple rounds of asexual replication in the human host and both asexual (replication in the oocyst stage) and sexual reproduction in Anopheles mosquitoes. When a mosquito carrying sporozoites bites an individual, some parasites enter the blood stream and rapidly invade hepatocytes. After replicating asexually to generate several thousand merozoites, the parasite exits the liver and invades erythrocytes. Numerous rounds of asexual reproduction follow, with repeated invasion of erythrocytes, developing from ring to trophozoite to schizont stages every 48 h, resulting in a dramatic increase in the number of parasites circulating in the host. For as yet undetermined reasons, some parasites switch into sexual stages, known as gametocytes. It takes about 12–14 days for a committed merozoite to develop into a mature gametocyte (Hawking et al. 1971), which circulates in the bloodstream and is taken up by the female mosquito during a blood meal. Within the mosquito midgut, mature male and female gametes emerge from infected erythrocytes and fuse to form a zygote, the brief diploid phase of the parasite life cycle that is otherwise entirely haploid. Meiosis follows within 3 h of fertilization (Sinden and Hartley 1985), and the parasite develops into an ookinete that crosses the midgut wall and grows into an oocyst. Mitotic division within the oocyst produces thousands of sporozoites that break out and travel by the hemolymph to the mosquito salivary glands. These sporozoites are then injected into a human host when the mosquito takes its next blood meal. The sexual stages are essential for transmission, and therefore both the transmission rate and the frequency of infections containing multiple genotypes determine the effective recombination rate that is a key consideration in genetic mapping studies.

The parasite genome contains ~23 Mb nucleotides and is predicted to encode ~5,400 genes located on 14 chromosomes, along with a 35-kb circular plastid genome and a 6-kb mitochondrial genome (Gardner et al. 1993; Gardner et al. 2002; Suplick et al. 1988; Wilson et al. 1996). Approximately 60% of the predicted genes encode hypothetical proteins, although recent re-annotation of the genome has assigned additional putative functions to many additional genes (PlasmoDB v5.4; http://www.plasmodb.org/plasmo/home.jsp). Deciphering the functions of these genes and their interactions presents a great challenge to malaria researchers. Furthermore, applying the newly acquired molecular knowledge to disease control, for example, by identifying novel drug or vaccine targets, will be another difficult task. The availability of the genome sequence has opened up the malaria research field to the application of genome-based methodologies. Microarrays printed with synthetic oligonucleotides, PCR products from cDNA libraries and Affymetrix array chips have all been used to study gene expression and regulation during parasite development (Ben Mamoun et al. 2001; Bozdech et al. 2003; Daily et al. 2007; Hayward et al. 2000; Le Roch et al. 2002; Rathod et al. 2002; Volkman et al. 2002), to detect copy number (CN) changes and nucleotide substitutions (Carret et al. 2005; Jiang et al. 2008b; Kidgell et al. 2006; Ribacke et al. 2007), and recently to study regulatory RNA (Mourier et al. 2008). The genome sequence also provides a resource for high-throughput proteomic analyses of various parasite stages (Florens et al. 2002; Hall et al. 2005; Khan et al. 2005; Lasonder et al. 2002; Sam-Yellowe et al. 2004). These global expression and structural analyses have provided important leads to gene function and gene regulation of the parasite, but do not identify specific functions for individual genes. Decoding individual gene function will require multiple, cross-discipline approaches, including methods of biochemistry, cell biology, physiology, genetics, etc. Genetic transformation is one of the useful techniques playing an increasing important role in studying gene function (Crabb and Cowman 1996; Crabb et al. 1997; Maier et al. 2008; van Dijk et al. 1995; Wu et al. 1995; Wu and Wellems 1996), especially for genes that are active in the sexual stages (Furuya et al. 2005; Lobo et al. 1999; Menard et al. 1997; van Dijk et al. 2001). However, because malaria parasites are haploid, lethal disruptions or fitness costs affecting growth limit the ability to generate knockouts for many genes expressed in the asexual stages.

Drug resistance in the P. falciparum malaria parasite

Chloroquine (CQ) resistance (CQR)

The majority of studies on drug resistance have focused on those of P. falciparum, although CQR in P. vivax has received increasing attention (Baird et al. 2007; Ratcliff et al. 2007; White 1998). CQR in P. falciparum was reported in two foci on the Thai–Cambodia border and in Colombia in the late 1950s and early 1960s, respectively, and has since spread to all the major malaria endemic regions, including Africa and South America (Fig. 1a) (Chen et al. 2003; Payne 1987; Vieira et al. 2001; Wootton et al. 2002). Mutations in a putative transporter, PfCRT (P. falciparum chloroquine resistance transporter), determine CQR, in particular a substitution at amino acid position 76 (Cooper et al. 2002; Djimde et al. 2001; Fidock et al. 2000; Sidhu et al. 2002). Mutations and CN changes in another gene (pfmdr1) that encodes a homolog of the human multi-drug resistance p-glycoprotein (PfPgh1) were also associated with CQR or pfcrt mutations (Djimde et al. 2001; Duraisingh et al. 2000; Foote et al. 1990; Foote et al. 1989; Mu et al. 2003), but the contribution of pfmdr1 in modulating CQR is still unknown (Hayton and Su 2004). Additionally, parasite isolates that carry the same pfcrt and pfmdr1 alleles, but have different CQ phenotypes as measured by IC50, suggest that additional genes modulate the parasite response to CQ (Ferdig et al. 2004; Mu et al. 2003). These additional genes may encode various transporters, for example, other ABC transporters that can modulate food vacuole pH (Cooper et al. 2005; Jiang et al. 2006; Zhang et al. 2002) or could include molecules involved in GSH-mediated degradation of ferriprotoporphyrin (FP) and/or GSH drug adduct transport (Borst et al. 2000; Ginsburg and Golenser 2003).

Fig. 1
figure 1

The spread of CQR- and high level PYR-resistant P. falciparum can be traced to only a few independent origins. a Mutations in pfcrt conferring CQR originated and spread from at least five independent foci, including a single origin of all CQR alleles in Southeast Asia and Africa, two independent origins in South America, one in Papua New Guinea, and at least one origin in Melanesia (Chen et al. 2003; Wootton et al. 2002). Areas colored in red indicate the presence of CQR P. falciparum, while black circles indicate the spread of CQR parasites from the origins. b Triple or quadruple mutant pfdhfr alleles conferring high level PYR resistance can also be traced to a few origins. All alleles found in Southeast Asia and the majority of African parasites share a common ancestor, with at least one other independent origin in South America, one additional African origin in Kenya and an independent origin in Melanesia (Cortese et al. 2002; Maiga et al. 2007; McCollum et al. 2007; McCollum et al. 2006; Mita et al. 2007; Nair et al. 2003; Roper et al. 2003; Roper et al. 2004). The labeling of origins represents approximate reported locations, not the actual origins or the scale of a region that a particular allele covers. More independent pfdhfr triple or quadruple mutants are expected when more samples from different endemic regions are typed. Areas shaded in yellow are confirmed to harbor both CQR and high-level PYR-resistant parasites, while areas colored in red indicate the presence of CQR P. falciparum, where high-level PYR resistance is either absent or unknown. CQ chloroquine, PYR pyrimethamine, pfcrt P. falciparum chloroquine resistance transporter, dhfr dihydrofolate reductase. This figure was modified from Wellems (2004) and (Guerra et al. (2008)

Sulfodoxine pyrimethamine (SP or Fansidar) resistance

Resistance to SP, a combination of pyrimethamine (PYR) and sulfadoxine (SDX), was also first reported in Thailand (Chongsuphajaisiddhi et al. 1979). Both drugs are antifolates and exhibit a high degree of synergy when administered together (Chulay et al. 1984). PYR is structurally similar to dihydrofolate, and SDX is an analog of p-aminobenzoic acid (PABA); therefore, PYR and SDX are competitive inhibitors of dihydrofolate reductase (DHFR) and dihydropteroate synthethase (DHPS), respectively. These are key enzymes in the folate pathway, and mutations in dhfr and dhps have been demonstrated to confer resistance to PYR and SDX (Cowman et al. 1988; Peterson et al. 1990; Peterson et al. 1988; Reeder et al. 1996; Siriwaraporn 1998; Triglia et al. 1997); however, one major difference between PYR resistance and CQR is the number of independent mutation events, known as founder mutations. There have been only a limited number of founder mutations for CQR (five described to date) (Fig. 1a) (Chen et al. 2003; Wootton et al. 2002). In contrast, low levels of PYR resistance can occur quite frequently, because the resistance is conferred by a single point mutation in the dhfr gene. Higher levels of PYR resistance, however, require multiple additional mutations in dhfr that also appear to have limited origins and different scales of selective sweeps (Fig. 1b) (Cortese et al. 2002; Maiga et al. 2007; McCollum et al. 2007; McCollum et al. 2006; Mita et al. 2007; Nair et al. 2003; Roper et al. 2003; Roper et al. 2004). In addition, mutations and/or gene expression changes in other genes in the folate metabolism pathway may also contribute to SP resistance (Kidgell et al. 2006; Volkman et al. 2007; Wang et al. 1997).

Mefloquine (MQ) resistance (MQR)

MQ had been extensively used only in Southeast Asia, and parasites resistant to MQ were reported after only a few years following the introduction of the drug (Boudreau et al. 1982; Nosten et al. 1991). Currently, MQ is ineffective on the Thai–Cambodia border and has been removed from the treatment plans of many countries in the region. Amplification of pfmdr1 (increased copy number, CN) has been associated with increased MQ IC50 (Cowman et al. 1994; Peel et al. 1994; Price et al. 1999; Price et al. 2004; Sidhu et al. 2006; Wilson et al. 1989); and disruption of one of two copies of pfmdr1 in the FCB parasite led to a decrease in IC50 to MQ, ART, lumefantrine, haloafntrine (HAL), and QN (Sidhu et al. 2006). However, several lines of evidence question the role of amplified pfmdr1 in MQR: (1) MQ selection of P. falciparum parasites does not always lead to increased pfmdr1 CN (Lim et al. 1996); (2) not all field studies showed an association of MQR with increased pfmdr1 CN (Chaiyaroj et al. 1999); and (3) because increased MQ IC50 levels appear to correlate with increased IC50 in parasite response to QN and HAL, higher pfmdr1 CN could be due to selection of QN and/or other drugs (Cowman et al. 1994). Support for the cross-resistant theory came from Africa, where parasites resistant to MQ were detected in areas where QN has been widely used, but not MQ (Brasseur et al. 1991; Brasseur et al. 1992; Lobel et al. 1998). Therefore, because pfmdr1 appears to play a role in parasite responses to many drugs, the association of pfmdr1 CN and increased MQ IC50 could simply reflect the combinational results of extensive selection of multiple drugs on P. falciparum parasites. The association of higher levels of MQ IC50 to higher pfmdr1 CN could also reflect low-level compensatory responses to MQ selection. Also, amplification of pfmdr1 may be a broad parasite compensatory response to physiologic changes in the food vacuole due to mutations in pfcrt and other genes (Jiang et al. 2008a).

Increased pfmdr1 CN predicts potentially higher levels of mRNA transcript and its protein product, PfPgh1; however, few studies show good correlation of pfmdr1 CN with its mRNA or protein expression levels. Higher levels of mRNA and PfPgh1 appeared to correlate with higher IC50 levels in laboratory selected parasites (Cowman et al. 1994; Jiang et al. 2008a) and genetic cross progeny (Rohrbach et al. 2006) but not in other studies (Lim et al. 1996). Firm correlation of pfmdr1 CN and mRNA or protein levels will require further investigation. Considering all the factors that can influence pfmdr1 CN and the difficulties to date in accurately measuring PfPgh1 levels, the precise role of pfmdr1 in MQR remains an open question.

Quinine resistance (QNR)

QN has been used to treat malaria for hundreds of years and is still effective in treating P. falciparum and other parasites, particularly when combined with other drugs (Ejaz et al. 2007; Pukrittayakamee et al. 2000), although its efficacy is declining in some endemic regions (Pukrittayakamee et al. 2000; Pukrittayakamee et al. 1994; Zalis et al. 1998). The molecular basis of QNR, as defined by an increase of in vitro IC50, remains uncertain, although various molecules have been reported to play a role in parasite response to QN. pfcrt, a gene encoding a putative Na+/H+ exchanger (PfNHE), pfmdr1, and a locus on chromosome 9 have all been associated with higher levels of IC50 in progeny of a genetic cross (Ferdig et al. 2004; Wellems et al. 1990). The role of pfmdr1 in QNR was consistent with the report that the N1042D substitution in PfPgh1 contributed to QNR (Sidhu et al. 2005), and the involvement of PfNHE in QNR was also supported by a recent observation that elevated PfNHE activities were found in parasites with high levels of QN IC50 (Bennett et al. 2007). Both PfPgh1 and PfNHE may play a role in regulating cytosolic and/or vacuolar pH, leading to changes in drug accumulation (Bennett et al. 2007; Rohrbach et al. 2006). Other unknown transporters, particularly ABC transporters, may also contribute to QNR, because the P. falciparum parasite response to QN is probably a multi-gene trait (Ferdig et al. 2004; Mu et al. 2003), and the requirement of multiple loci for QNR may explain why QN is still effective in treating malaria parasites after ~350 years of use. Again, the molecular mechanism and mutations underlying QNR require further investigation.

Resistance to artemisinin (ART)?

ART, also called Qinghaosu, was first isolated from a Chinese herb Huanghuahao (Artemisia annua L.) in 1972, and its structure and pharmacologic properties were first characterized and published by Chinese scientists in 1978 (Qinghaosu Research Group 1978; Zhang 2006). Definitive antimalarial activities of ART were published in an English Chinese medical journal in 1979 (Qinghansu Antimalarial Co-ordinating Group 1979). ART and its derivatives are safe and effective against all asexual stages and gametocytes, particularly when administered in combination with other drugs (Nosten and White 2007). Resistance to ART and its derivatives has not yet been confirmed, although decreased parasite susceptibility to the drug has been reported (Laufer et al. 2007; Lim et al. 2005; Wongsrichanalai and Meshnick 2008). Positive correlations in parasite responses (in vitro IC50) to ART, MQ, HAL, lumefantrine and possibly QN and partial negative correlation of ART to CQ have been reported (Basco and Le Bras 1992, 1993; Chaijaroenkul et al. 2005; Cowman et al. 1994; Pradines et al. 2006; Price et al. 2004). Selection of P. falciparum parasites with MQ usually leads to increased MQ IC50 but increased sensitivity to CQ and vice versa, which can be again correlated with increased pfmdr1 CN (Cowman et al. 1994). Therefore, the low-level increase in ART IC50 in some parasites could be cross-resistance to MQ, QN, or might have been selected by other drugs and not due to true ART pressure. These cross-resistance patterns suggest similar pathways or genes that are involved in metabolism and/or parasite uptake of these drugs and raises an important issue regarding the use of drug combinations containing ART and its derivatives with either MQ or HAL. Considering the extensive MQR in Southeast Asia and negative correlation of ART and CQ, use of MQ and ART combinations may not be much better than a combination of CQ and ART, particularly in West Africa where CQ is still quite effective.

Another candidate gene that has been associated with parasite response to ART is PfATP6 (Jambou et al. 2005). ART was shown to inhibit the ATPase activity in an oocyte expression system (Eckstein-Ludwig et al. 2003); however, the association of mutations in PfATP6 with ART resistance requires further confirmation, particularly as clinicians are uncertain whether true ART-resistant parasites have even arisen yet. For more information on other antimalarial drugs and resistances, readers are referred to other excellent reviews (Baird 2005; Le Bras and Durand 2003; White 1999).

The putative interactions of different proteins in the food vacuole are summarized in Fig. 2.

Fig. 2
figure 2

Potential molecular interactions in P. falciparum responses to multiple antimalarial drugs. A White pathway: CQ pressure leads to mutations in PfCRT that can affect CQ transport and/or FV pH. To compensate mutations in pfcrt, other changes such as CN changes or nucleotide substitutions in pfmdr1 and other molecules may be required. The reduction in pfmdr1 CN under CQ selection suggests that PfPgh1 may act as a transporter, transporting CQ in the opposite direction of that of PfCRT. B Green pathway: QN resistance requires changes in multiple genes, which could explain the slow emergence of QNR and how QN use might have contributed to pfmdr1 CN increase in Africa. C Black pathway: MQ and HAL could act either on pfmdr1 and/or on other unknown genes, leading to increases in pfmdr1 CN. Amplification of pfmdr1 may not be the major mechanism mediating MQ resistance, but similar to CQR, CN changes may represent a compensatory response to restore parasite fitness. D Blue pathway: involvement of pfmdr1 in ART resistance is likely a background low-level association that could be due to selection by other drugs. Clinical ART treatment failures due to ART resistance have yet to be confirmed and therefore the role of PfATP6 in resistance remains questionable. FV food vacuole, RBC red blood cell, CN copy number, CQ chloroquine, QN quinine, MQ mefloquine, HAL halofantrine, and ART artemisinin. Note: PfVPs are likely to be on FV membrane, but their locations require further confirmation

Drug assays and phenotype measurement

Genome-wide association studies now offer an opportunity to detect genome variation underlying parasite drug responses. While much attention has focused on the development of high-throughput techniques to detect genetic variation (genotyping), the most critical issue is the accurate measurement of parasite drug response (phenotyping). Over the years, many methods have been developed to evaluate P. falciparum parasite responses to antimalarial drugs and to monitor therapeutic efficacy. For drug resistance surveys, both in vivo and in vitro methods have been used. In vivo surveys analyze parasite clearance time using established treatment regimens, typically involving the microscopic examination of a patient blood smear 7–28 days after drug administration following World Health Organization guidelines (WHO 2000). For a more quantitative measure, many in vitro drug assays have also been developed. These assays are based on measuring parasite growth or growth inhibition under various drug concentrations, and proliferation is quantified by either counting parasitemia microscopically (Rieckmann et al. 1978), measuring [3H] hypoxanthine incorporation (Desjardins et al. 1979), parasite lactate dehydrogenase activity (Makler et al. 1993), or signals from antibodies against histidine-rich protein II (HRPII) (Noedl et al. 2005, 2002), and more recently by quantifying the amount of parasite DNA with SYBR green or DAPI dyes (Bennett et al. 2004; Johnson et al. 2007; Smilkstein et al. 2004). These assays have been widely used to survey parasite responses to drugs in laboratories and at field sites. Testing drug responses at field sites is particularly challenging because parasites are often tested directly following isolation from patient blood samples or after a very brief time in tissue culture. There are many factors that can influence drug test results that are assessed either in vivo or in vitro using blood samples collected directly from patients. (1) Parasites usually require a period of adaptation before they can grow in culture, and many parasites die because of problems surviving in vitro culture conditions, regardless of drug sensitivity; (2) patients may have taken medication before being sampled, possibly leading to false negative results; (3) patient blood usually contains different numbers of parasites, and drug assays are sensitive to variation in parasitemia; (4) patient blood is usually sampled just once before antimalarial drug treatment, preventing repeated tests at different times; and (5) mixed infections of different genotypes are highly prevalent in malaria endemic areas, particularly in Africa and Southeast Asia (Bendixen et al. 2001; Kobbe et al. 2006). Mixed infection of parasites resistant and sensitive to a drug may produce IC50 measurements that do not reflect the true IC50 values of the parasites (Liu et al. 2008). Therefore, while in vitro drug tests using patient blood samples may be a good method for therapeutic efficacy surveys, they may not provide an accurate measurement of individual parasite drug response, which is a critical requirement for mapping loci affecting drug resistance, particularly those controlled by multiple quantitative trait loci (QTL). Therefore, adapting field isolates to in vitro culture and cloning individual genotypes before drug testing will be necessary for accurate and reproducible measurements of parasite drug response to ensure that genome-wide association studies will be informative. It is difficult, however, to correlate in vitro IC50 measurements with a drug concentration that is predictive of clinical resistance. The IC50 value that corresponds to clinical drug failure depends on many factors. The threshold IC50 for CQ and PYR are well characterized, but for many other drugs, IC50 values predictive of clinical resistance are unknown. For CQR, the difference in IC50 between Dd2 (404 nM), a typical CQR parasite, and HB3 (34 nM), CQS parasite, is approximately 10-fold (Mu et al. 2003). Each drug requires studies comparing in vitro IC50 values of parasites from clinical treatment failures and parasites from patients who are successfully cured in order to define in vitro drug resistance. As discussed above, many factors can influence both clinical treatment outcome and in vitro drug test results and these factors will have to be carefully addressed when conducting this type of correlation study.

Approaches for mapping malaria phenotypes

Linkage mapping

An important approach to study gene function is genetic mapping. In principle, phenotypic differences can be linked to genetic variations, providing functional clues of the linked genes. This approach is based on detecting association or linkage of certain genetic variations with a specific phenotype (Su and Wootton 2004). Linkage mapping using progeny from genetic crosses has been proven to be very powerful, leading to the identification of loci linked to drug resistance, immunity, and other traits (Furuya et al. 2005; Hayton et al. 2008; Martinelli et al. 2005; Peterson et al. 1988; Su et al. 1997; Vaidya et al. 1995; Wang et al. 1997; Wellems et al. 1990). A genetic cross is performed by feeding mosquitoes infectious gametocytes from two parents with different phenotypes. For P. falciparum crosses, following fertilization and development in the mosquito midgut, sporozoites are injected into the blood stream of a chimpanzee by mosquito bite. After completing the liver stage, the recombinant progeny infect erythrocytes and are adapted to in vitro tissue culture. Individual clones are subsequently obtained after limiting dilution of blood samples containing the erythrocytic stages. One example of using genetic mapping to successfully identify genes associated with a specific phenotype is the mapping and identification of pfcrt, the gene that plays a key role in CQ resistance in P. falciparum (Fidock et al. 2000; Su et al. 1997; Wellems et al. 1990; Wellems et al. 1991). The advantage of mapping using progeny from genetic crosses is that the genetic backgrounds in the progeny are derived from the two parents, eliminating the noise observed in studies using field isolates with diverse genetic backgrounds.

Candidate gene association

Despite the power of classical genetics, the process of generating a cross is labor intensive, expensive, ethically challenging (use of non-human primates should be minimized), and requires special facilities for raising mosquitoes and primates that are available to only a few institutions. Therefore, another approach, population-based allelic association, is extremely attractive to malaria researchers, as the continued advances in techniques to detect genetic variation can be readily performed in individual laboratories. To date, most association studies in malaria parasites have been analyzed using the ‘candidate gene’ approach. The search for an association of mutations in pfmdr1 with parasite responses to CQ is one example of the candidate gene method (Djimde et al. 2001; Duraisingh et al. 2000; Hayton and Su 2004; Su and Wootton 2004; Valderramos and Fidock 2006). This approach, however, requires some knowledge of the candidate genes to be tested. A second approach of candidate gene association is somewhat between single-gene association and genome-wide association. A group of genes within a pathway or having similar functions can be tested for association to a phenotype based on the known functions of these genes. One example for P. falciparum was the association of multiple transporters with parasite responses to CQ and QN (Mu et al. 2003), although the results from this study have yet to be confirmed. This study was based on the observation that various transporters, particularly the ABC transporters, have been shown to contribute to drug resistances in many organisms (Dean and Annilo 2005).

Genome-wide association

Genome-wide characterization of genetic variation in P. falciparum now provides the opportunity to perform genome-wide association studies (Jeffares et al. 2007; Mu et al. 2007; Volkman et al. 2007). These studies rely on the presence of linkage disequilibrium (LD) in the parasite populations being tested. LD refers to the nonrandom association of alleles, which can be affected by different factors such as the physical distance between the loci and the genetic markers used to genotype the parasite DNA, recombination rate, age of the mutations, population structures, and other factors. Sexual recombination is an obligatory part of the lifecycle and malaria parasites produce both males and females. Therefore, self-fertilization can readily occur between gametes with identical genotypes, especially in areas with low transmission rates where single-clone infections prevail, or in homogeneous parasite populations that, for example, can result from a recent outbreak. LD can occur in natural populations where self-fertilization predominates to such an extent that the rate of recombination is not high enough to randomize genomes or break up clonal associations. This situation is relevant to malaria trait mapping, as outbreaks can occur in areas that were previously malaria free when travelers infected with malaria parasites return from malaria-endemic regions. A parasite population emerging from a recent selective drug sweep will also have limited diversity, leading to a relatively clonal population or high degree of LD. LD can also be the result of admixture of two or more subpopulations that have different allele frequencies. This situation can be created artificially when DNA samples for an association study are collected from parasites with different evolutionary histories or from different geographic origins. For example, P. falciparum and P. vivax parasites clustered largely according to their continental origins (Anderson and Day 2000; Joy et al. 2006; Mu et al. 2005; Volkman et al. 2007; Wootton et al. 2002), and mixing samples from different continents will likely produce artificial LD or false association if potential population structures are not properly evaluated. Care should also be taken for samples from different countries within a continent, or even within a country, with the exception of Africa, where high transmission rates and recombination rates may lead to low levels of structured populations (Mu et al. 2005; Volkman et al. 2007). High recombination rates, however, can also quickly reduce LD, making association studies difficult. Therefore, the population structure of parasites in the area to be sampled should be assessed before starting a genome-wide association study to reduce the chance of obtaining false LD and can be analyzed using various methods (Long 1991; Pritchard et al. 2000).

The chances of detecting LD between a genetic marker and an affected gene decreases with the increased distance between the affected gene and the marker, as well as with the number of generations since the origin of the mutation (Collins and Morton 1998; Kruglyak 1999). A malaria parasite can complete its life cycle within 2 months if optimal transmission conditions exist. Because of this short generation time and high recombination rate, a large number of genetic markers may be required to map genes in organisms such as malaria parasites. The recombination rate was estimated to be ~15 kb per cM in a genetic cross and among some field populations (Mu et al. 2005; Su et al. 1999; Volkman et al. 2007). The requirement for high marker coverage is particularly true for mapping genes with ancient mutations or mutations with unknown history, such as genes that contribute to virulence or parasite development. Additionally, a disease phenotype such as virulence is likely to be the result of host and parasite interactions, and mapping these disease-related phenotypes will require a consideration of variation in the host genome as well as variation in the parasite genome. This will likely become another important topic in malaria research, as high-density human genotyping arrays containing >500,000 features are now available for genome-wide association studies in humans (Gunderson et al. 2006). These arrays can be used to scan the human genome for loci influencing the pathology and outcome of malaria infection.

Most recent genome-wide association studies were conducted using a multi-stage approach (Hirschhorn and Daly 2005; Lowe et al. 2004). For the initial scan, a modest threshold is used to identify potential candidates. All the positive candidates are then tested in a second independent population with sample size similar to that of the initial population. At the second stage, only a small fraction of the initial markers, in particular those that were positive in the initial scan are tested. Indeed, the two-populations concept has already been applied to candidate gene association in P. falciparum (Anderson et al. 2005).

To date, no well-designed genome-wide association studies for malaria traits has yet been reported. One proof-of-principle study used hundreds of microsatellites (MSs) and showed that a large segment of chromosome 7 containing pfcrt could be detected using the MS markers (Wootton et al. 2002). With the development of high-throughput genotyping technologies, many genome-wide association studies for genes affecting drug resistance and other traits can be expected soon.

Methods of high-throughput genotyping

Microsatellites

MSs and single nucleotide polymorphisms (SNPs) are the two most commonly used genetic markers for genotyping and for genetic mapping studies in malaria parasites. Compared with SNPs, MSs are usually highly polymorphic with multiple alleles in the parasite populations (Su et al. 1999; Su and Wellems 1996). The higher diversity of MSs can be advantageous for mapping genes that have undergone a recent drug selective sweep (Anderson 2004; Su et al. 2007; Wootton et al. 2002); however, the high mutation rates in MS markers may lead to homoplasy or disruption of LD if the parasite populations are separated for a long period of time (Anderson et al. 2000). Although a large number of MSs can be typed relatively easily using an automatic DNA sequencer and multiplex reactions, the throughput of MS typing is quite limited compared with SNP typing methods that have recently been developed (Gunderson et al. 2005; Hagiwara et al. 2007; Hardenbol et al. 2005; Kennedy et al. 2003; Lindblad-Toh et al. 2000; Steemers and Gunderson 2007). Indeed, SNP is replacing MS for large-scale genotyping; however, MS can still be very useful for small-scale typing projects, for example, to verify a parasite identity or to identify mixed infections. Polymorphic MSs are extremely abundant (one in less than a kb) in the P. falciparum (Su and Wellems 1996) but not in the P. vivax (Feng et al. 2003) genome. In principle, an extremely high-density MS genetic map can be developed for genetic mapping for P. falciparum traits if an economic and high-throughput typing method can be developed (Li et al. 2007).

Single nucleotide polymorphisms

With the development of new technologies and recent collections of large number of SNPs (Jeffares et al. 2007; Mu et al. 2007; Volkman et al. 2007), it is becoming more and more practical and economical to conduct genome-wide association studies that require no prior assumptions or knowledge of gene function or mutations underlying a phenotypic change (Anderson 2004). These high-throughput SNP typing methods (Gunderson et al. 2005; Hagiwara et al. 2007; Hardenbol et al. 2005; Kennedy et al. 2003; Lindblad-Toh et al. 2000; Steemers and Gunderson 2007) have recently led to the successful identification of several candidate genes (loci) associated with various human diseases (Buch et al. 2007; Hakonarson et al. 2007; Saxena et al. 2007; Scott et al. 2007; Sladek et al. 2007; Tomlinson et al. 2007; Winkelmann et al. 2007; Zanke et al. 2007).

Various methods have been or are being developed for typing SNP in malaria parasites such as pyrosequencing (Cheesman et al. 2007; Takala et al. 2006) and real-time PCR/MCA assay (Mens et al. 2008). However, effective and user-friendly high-throughput SNP typing methods for malaria parasites are still under development, but the principles or methods for typing human SNPs can readily be employed for typing malaria parasite genomes. Currently, the most promising methods for typing malaria parasites are microarray-based hybridizations. Indeed, a relatively high-density array has been successfully used to detect single feature polymorphisms (Kidgell et al. 2006). Another Affymetrix array containing ~2.32 million P. falciparum probes (25-mer) designed at the Sanger Center, UK, is commercially available and can be explored for typing the P. falciparum genome (PFSANGER array) (Jiang et al. 2008b; Mourier et al. 2008). A third P. falciparum specific array using the molecular inversion probe (MIP) (Absalan and Ronaghi 2007; Hardenbol et al. 2003) method has also been developed for typing parasite SNPs (Mu et al., unpublished). This array contains approximately 3,500 SNPs from the P. falciparum genome and can be useful for mapping mutations of recent occurrence, such as those conferring drug resistance. The advantages of the MIP array include a lower cost per slide/chip, because they utilize standard oligonucleotide design (~$200 vs. ~$500 for tiling array) and are able to detect a relatively small amount of DNA (~100 ng) due to the specific amplification of the DNA fragments to be tested. The disadvantage of the MIP array is the limited number of SNPs that can be printed in an array (200,000 maximum currently) due to limits in the techniques used. All of these arrays will play an important role in mapping malaria phenotypes. For the P. falciparum genome, one potential problem is that the high AT content and abundance of repeat elements in the genome may prevent designing probes to cover the majority of the noncoding regions properly. Therefore, printing arrays with much higher numbers of probes may not significantly improve the coverage of the P. falciparum genome. Other genotyping arrays such as Affymetrix 3 and 75 K arrays (S. Volkman, D. Wirth et al., Harvard University), NimbleGen 60-mer oligo array (M. Ferdig, Notre Dame University), and another Affymetrix tiling array with ~5 million probes (E. Winzeler, Scripps Institute) have been described at various meetings, but have not yet been published.

High-throughput sequencing

Another promising approach for the large-scale genotyping of parasites is high-throughput parallel genome sequencing, as new technologies have dramatically reduced the time and expense of sequencing (Bennett et al. 2005). Obtaining the DNA sequence is more informative than typing known SNPs, and if the cost of sequencing continues to decrease to a point comparable to the cost of array-based typing, massively parallel sequencing using the Solexa Sequence Analyzer (Illumina, Inc.), the Genome Sequencer FLX System (454, Inc.) or the SOLiD system (Applied System, Foster City, CA) may become the first choice for large-scale genotyping. For example, this ‘next-generation’ sequencing technology has reduced the cost of sequencing the human genome, which originally cost ~$2.7 billion and took 13 years to complete, to <$1.5 million for a full human genome to be decoded in just 4 months (Wadman 2008). Ultimately, the goal is to reduce this to $1,000 or less (Collins et al. 2003). However, because these methods only sequence short stretches of DNA (~35–250 bp), they currently depend on the availability of a reference genome. So for the immediate future, it will likely only be used for genome resequencing. This makes the method highly applicable to studying P. falciparum genetic variation, but even so, the high AT content and repetitive nature of the genome can present problems in sequence alignment that may lead to false SNP calls. In addition, handling the large amount of sequence data generated can be a difficult task for the average laboratory without high-capacity computing power.

Factors that may affect mapping studies in malaria parasites

Before a study can be properly performed, some practical issues such as the number and types of genetic markers, sample size, amount of DNA required, and statistical methods for analysis have to be considered. Other factors that can influence mapping outcomes and sample size calculations include the strength of LD between the trait-influencing allele and neighboring marker alleles in the parasite populations to be studied, the frequency of the trait-influencing allele, the effect or penetrance of the trait-influencing allele on the phenotype, parasite population admixture, sample selection bias, and frequencies of recombination between neighboring marker loci (Schork 2002; Wang et al. 2005). Misclassification of phenotypes (or inaccuracy in drug tests) is another critical component that can lead to false associations but often does not receive adequate attention. Calculations of statistical power and sample size are also often problematic because many of these parameters are unknown or difficult to predict. Therefore, it is difficult to estimate the sample size and marker coverage required for an association study without a good understanding of the factors that can influence the outcome of the study. It is possible, however, to make some estimates of these parameters based on what we know about the transmission intensity, recombination rate, and the history and incidence of drug resistance. To address some of the frequently asked questions, here we discuss a few important and unique issues associated with studies of human malaria parasites.

Blood samples and DNA preparation

One advantage of performing genetic association studies in malaria parasites is that the haploid genome requires no phase determination; however, there are many difficulties associated with working with malaria parasites. First, patient blood samples usually have relatively low parasitemia (<1%), making it difficult to obtain a large quantity of parasite DNA for genotyping. Second, patients are often infected with parasites of different genotypes that require isolation of individual clones. These issues not only create problems for drug assays, but also make genotyping of DNA extracted from blood samples difficult. In most field studies, a small amount of blood (1–2 drops) is typically spotted on a filter for easy storage and transportation. DNA extracted from the filter paper may be sufficient for PCR amplification (often requiring nested PCR) but generally are not sufficient for high-throughput methods such as array hybridization or multiple rounds of MS typing. Proper genotyping and drug assay often requires adaptation of the parasites into in vitro culture and, if necessary, cloning of individual parasites. This process can take weeks or months before an appropriate amount of DNA from a cloned parasite can be obtained for analysis. To streamline the process and to avoid laborious parasite cloning, DNA from patient blood can be genotyped directly using a set of highly polymorphic MS markers (6–10 markers); only the samples containing a single infection are selected for further analysis and adapted to in vitro culture. Genome-wide amplification of DNA isolated from field samples using commercial kits such as REPLI-g® Whole Genome Amplification (QIAGEN) or GenomiPhi DNA Amplification kit (Wang et al. 2008) can increase the amount of DNA in a samples 300–400 times. However, this approach will also not solve the problem of mixed infections. In addition, an individual white blood cell contains ~100-fold more DNA than a parasite; so even though DNA can be amplified, the majority of the DNA amplified will be host derived if the human DNA is not removed before amplification. To obtain a sufficient amount of DNA for large-scale genotyping, it is necessary to extract DNA from blood samples after removing white blood cells or to obtain DNA from culture-adapted parasites and various methods and commercial products have been used to remove white cells before DNA extraction (Carlton et al. 2001).

Drug selective sweep and sampling sites

Because almost all the parasites circulating in Southeast Asia (Thailand, for example) are resistant to CQ (i.e., the resistant alleles are fixed), we should be able to readily detect a reduction in allelic diversity in the chromosome region surrounding the pfcrt gene among CQR parasites (Wootton et al. 2002), although the region of reduced diversity may be small after ~60 years of transmission and recombination. CQR parasites appeared on the Thai–Cambodia border in the late 1950s, spread to east Africa in the late 1970s, and reached West Africa in the 1990s (Fig. 1a) (Payne 1987; Wootton et al. 2002); however, it can be expected that higher levels of LD surrounding the pfcrt gene can be found in West Africa because despite higher transmission rates, there has been a shorter time for recombination to break down linkage and to reduce LD. Indeed, a chromosome 7 segment surrounding pfcrt that contained more than 200 kb (~1% of the genome) exhibited LD in CQR parasites collected from Southeast Asia (some were collected more than 20 years ago) and Africa during 1980–2000 (Wootton et al. 2002). In this case, the region containing pfcrt was easily detected using 342 genome-wide MS markers. The results also showed the presence of a large CQ selective sweep from Southeast Asia to Africa (Wootton et al. 2002) and that signatures of a drug selective sweep can be used to detect loci conferring drug resistance (Anderson 2004; Nair et al. 2003; Roper et al. 2003; Roper et al. 2004). Whole genome long-range haplotype tests that have been widely used to detect positive selection in human genome can also be applied to search for signatures of selection in the parasite genome (Sabeti et al. 2007; Volkman et al. 2007; Zhang et al. 2006). However, the selection signature may not be obvious if the beneficial alleles have multiple origins (Nair et al. 2007) or multiple genes in the genome are subject to the same selection force.

The consideration of LD and history of drug resistance highlights the important issue of sampling location when designing an association study. It is preferable to sample parasites from locations with frequencies of resistant parasites of between 20 and 50% so the proportions of both resistant and sensitive parasites are approximately equal, therefore giving a good chance of having detectable LD. A resistant allele frequency of 20–50% will greatly reduce the sample size required to detect associations (Wang et al. 2005).

Number of genetic markers available to use in a study

Because of difficulties in obtaining high-quality DNA samples from cloned parasites and the laborious processes of generating accurate phenotype data in drug assays, use of as small a sample size as possible will save time and money. Considering that most drug resistances are recent founder mutations with relatively high variant allelic odds ratios, it is possible to detect mutations conferring resistance using a few hundred samples typed with 300–500 genetic markers (Wootton et al. 2002). Of course, for most studies, the scale of LD in parasite populations is unknown, and many more markers will have to be used for typing parasite populations with high recombination rates and for traits that have unknown history. Fortunately, high-throughput genotyping methods are now (or soon will be) available; and the number of genetic markers should no longer be a major concern for mapping drug-resistant genes.

An intriguing question is how many SNPs that are useful for genetic mapping can be expected from the P. falciparum genome? More than 100,000 SNPs have been deposited in the malaria database PlasmoDB (www.plasmoDB.org); however, the majority of the SNPs could be private SNPs (present in a frequency lower than 5% among parasite isolates) or derived from highly polymorphic antigen gene families (Jeffares et al. 2007; Mu et al. 2007; Volkman et al. 2007). The number of common SNPs that will be useful for genetic mapping in P. falciparum is likely to be between 50,000 and 100,000 (Mu et al. 2007), equivalent to the density provided by the estimated 10 million SNPs in the human genome (Collins et al. 2003). Fortunately, the parasite has a small genome; and even 50,000 SNPs can provide a map with a marker density of one SNP per 460 bp. Finally, the number of genetic markers required will depend on the trait being analyzed and many other factors; we do not want to use too many markers in a study if we do not have to.

Influence of a gene on parasite traits

Another factor to consider is how much influence a mutation (or allele) exerts on a trait, which is usually unknown before identification of the mutation. An odds ratio of exposure to the susceptible genetic variant in cases compared with that in controls greater than 1.5 will also significantly lower the number of samples required (Wang et al. 2005). Although most drug-resistance traits are likely to be phenotypes involving multiple loci, only a limited number of genes may play a major role in conferring resistance, such as pfcrt and pfdhfr in CQR and PYR resistance, respectively. Indeed, mutations in these two genes can change parasite response to CQ and PYR dramatically, converting a clinically sensitive parasite into a resistant parasite. The existence of a few genetic loci having large effects and numerous loci with small effects appear to be true for most phenotypes (Wang et al. 2005). Therefore, the odds ratios of resistant alleles with resistant phenotypes are likely to be high for mutations that confer a significant contribution to drug-resistant phenotypes in malaria parasites.

Summary

Recent developments in high-throughput genotyping technologies have made it possible to conduct genome-wide mapping studies in malaria parasites. For mapping genes conferring drug-resistance traits, genome-wide association studies are practical because relatively recent founder mutations underlie drug resistance, and continued drug pressure will help preserve LD. For mapping other phenotypes such as virulence and growth rate differences, much higher densities of genetic markers will be required due to the complexity of the phenotypes and unknown evolutionary histories of the mutations. The difficulties in mapping studies in malaria phenotypes include the collection of large numbers of parasite samples and accurate measurement of phenotypic differences; however, these problems can be overcome, and we can expect to see informative genome-wide association studies in P. falciparum in the near future. Whether these studies will help to identify new targets for malaria control remains to be seen.