Assumptions in research in sport and exercise psychology

https://doi.org/10.1016/j.psychsport.2009.01.004Get rights and content

Abstract

Objectives

The aim of this article is to outline how certain key assumptions affect the quality and interpretation of research in quantitative sport and exercise psychology.

Methods

A review of three common assumptions made in the sport and exercise psychology literature was conducted. The review focused on three assumptions relating to research validity and the treatment and interpretation of observations. A central theme to this discussion is the assumption that research observations reflect true effects in a population.

Results

Assumptions often made in sport and exercise psychology research were identified in three key areas: (1) validity, (2) inferences of causality, and (3) effect size and the “practical significance” of research findings. Findings indicated that many studies made assumptions about the validity of the self-report psychological measures adopted and few provided a comprehensive evaluation of the validity of these measures. Researchers adopting correlational designs in sport and exercise psychology often infer causality despite such conclusions being based on theory or speculation rather than empirical evidence. Research reports still do not include effect size statistics as standard and confine the discussion of findings to statistical significance alone rather than commenting on “practical significance”.

Conclusion

Research quality can only be evaluated with due consideration of the common assumptions that limits empirical investigation in sport and exercise psychology. We offer some practical advice for researchers, reviewers, and journal editors to minimise the impact of these assumptions and enhance the quality of research findings in sport and exercise psychology.

Introduction

Research reports in sport and exercise psychology often conclude with a familiar refrain: an acknowledgement of limitations and future research directions. While it is important to acknowledge that investigations are seldom ‘perfect’, few researchers give extensive consideration to these limitations and journal editors often make the inclusion of ‘limitations sections’ a requirement without consideration of their meaning and their impact on the quality of the research. Readers may also treat limitations sections in the same manner, dismissing them as troublesome caveats imposed by reviewers and editors without considering how these limitations and the assumptions that accompany them might affect the interpretation of the findings.

Importantly, limitations sections often highlight assumptions that may have a substantive impact on the interpretation of the research and the meaning of the findings. These are likely to be germane to the reader's overall evaluation of the quality of the research and the measure of its contribution to the sport and exercise psychology literature. Therefore, the assumptions that researchers make affect evaluations of research quality at both the micro- and macro-level (see Weed, 2009). Assumptions impinge on quality at the micro-level as they delimit whether the researcher has satisfactorily addressed the hypotheses of the investigation and adopted appropriate methods and analyses to appropriately confirm or falsify the hypotheses. Assumptions influence quality at the macro-level because they determine the extent to which investigations making such assumptions can pose important and relevant questions in the wider field, the appropriateness of methods and methodologies in addressing those questions, and, as a result, the validity of the contribution made to the field. Limitations sections that outline research assumptions should not, therefore, be so summarily dismissed with such blasé attitude.

The purpose of this article is to identify some of the assumptions that underpin research in sport and exercise psychology, to evaluate the impact of these assumptions on the quality of the research and the meaning of the findings, and to offer some practical advice to researchers and the reviewers and editors of journals on how to give due consideration and account for these assumptions when designing, conducting, analysing, reporting, and evaluating research in the field.

People are very good at making assumptions. Indeed, there is an entire psychological literature on how people make inferences based on observations of others' behaviour. For example, research on the actor–observer effect (Jones and Harris, 1967, Jones and Nisbett, 1972) provided evidence that people tend to make dispositional (trait-like, stable) attributions about a person when observing their behaviour (e.g., if you see someone acting violently toward another person and encounter them again are likely to avoid them because you label them as ‘violent or argumentative’). Similarly, when evaluating research, people who are familiar with psychological measures often make macro-level assumptions regarding the theory, methods, analyses, and findings based on previous experience. As with the actor–observer effect and the many attributional assumptions that people make in their everyday lives, assumptions can sometimes lead to erroneous conclusions regarding research findings like over-generalisations of validity and reliability when such generalisations are unfounded and inferences of causality when the basis of such inferences is flimsy at best. Our aim in this article is to highlight some areas where people make assumptions regarding their findings at a micro-level that could potentially lead to erroneous conclusions at a macro-level, provide some illustrations of those assumptions (on many occasions using our own work!), and demonstrate how we (and others) have attempted to allay or resolve the problems arising from those assumptions. Specifically, we will focus on three main issues relating to assumptions: (1) validity, (2) inferences of causality and generalisability, and (3) effect size and “practical significance”.

Psychologists are generally taught of the imperatives of ensuring their methods and instruments conform to acceptable criteria of validity in order to be confident that the effects they test (e.g., relations or differences among psychological variables) reflect the true effect in that population. There are six different forms of validity and each can be considered a separate component of overall validity: face validity, construct validity, concurrent validity, discriminant validity, predictive validity, and nomological validity. Often, researchers tend to rely on one form of validity (e.g., face or convergent validity) when developing methods and deem that sufficient before they can proceed to test their hypothesised effects. However, generally this assumption is erroneous, because meeting acceptability criteria for each form of validity provides converging evidence for the acceptability and relevance of the methods being used and whether the findings can be ‘trusted’. Generally, meeting one of these validity criteria is only sufficient when the focus of the research is to evaluate a specific form of validity for a specific method or measure.

Most research in sport and exercise psychology is applied social psychology, and many social psychologists make assumptions about the validity of the measures they use and often presuppose that previous validation efforts imply that they can forgo tests of validity. Bagozzi (1981b) contends that certain methods of measurement can become so frequently used and “so extensive that questions of validity have sometimes been taken for granted” (p. 323) as in the case of expectancy-value models of attitudes toward physical activity (see Biddle et al., 2007, Hagger and Chatzisarantis, 2008). Further, researchers that do provide some analyses to evaluate “the adequacy of measures generally limit analysis to an assessment of reliability” (Bagozzi, 1981b, p. 323). As a consequence, it is difficult to evaluate whether the measures developed and used to test hypothesised effects conform to validity criteria. Furthermore, inferences drawn from these tests of effects are only as valid as the measures used in the tests and if there are no checks for validity of methods and measures it opens to question the validity of the findings. This section will identify, define, and analyse the different types of validity relevant to sport and exercise psychology research and evaluate the problems that arise should each validity type be assumed without adequate preliminary validity testing.

A cursory look at the two most recent volumes of Psychology of Sport and Exercise (PSE; 2007, Vol. 9, Issues 1–6, and 2008, Vol. 10, Issues 1–5) and Journal of Sport and Exercise Psychology (JSEP; 2007 Vol. 29, Issues 1–5 and 2008 Vol. 30, Issues 1–3) available at the time of writing and represent the two most cited serials in the field, reveals that 77 of 79 articles (97.4%) in PSE and 40 of 47 articles (85.0%) in JSEP included self-report measures of psychological constructs or behaviour. While these journals and the particular issues are chosen for convenience and illustration, they are generally representative of the state of the literature in the field. This anecdotal analysis illustrates that researchers in sport and exercise psychology rely heavily on self-report measures of psychological traits and states, and behaviour.1 The self-report measures used are usually questionnaire, pen-and-paper measures, but occasionally include verbal reports such as data collected from personal or telephone interviews but also an increasing use of online self-report measures. Although it is clear that research in psychology has moved on since Nisbett and Wilson's (1977) critique of self-report measures and the inaccessibility of inner-states to the reporter, recent research has been critical of this over-reliance and called for more direct measures of actual behaviour (Baumeister, Vohs, & Funder, 2007) or, at least, clear links between psychological measures and meaningful behavioural measures or outcomes (Andersen, McCullagh, & Wilson, 2007). Baumeister et al. (2007) therefore suggest that the psychology research community includes “direct observation of behaviour wherever possible and in at least a healthy minority of research projects” (p. 396). This is a noble call, if perhaps a little unrealistic, but it illustrates an imperative: if sport and exercise psychology is a “science of self-reports” (Baumeister et al., 2007, p. 396) then its measures and methods need to bear up under precise scrutiny with respect to the validity of those self-reports including verification concurrently with actual measures of behaviour of behavioural outcomes.

Researchers in sport and exercise should be mindful of five types of validity when considering the adequacy of self-report measures of psychological variables. These types of validity are heavily steeped in classical test theory and psychometric scaling (Kline, 2000, Kline, 2005, Nunnally and Bernstein, 1994). Face validity refers to researchers and ‘expert’ judgements or ratings that the content of a self-report measure or item captures some or all aspects of the psychological construct of interest. In other words, the measures capture the true nature or essence of the construct. This type of validity is paramount in the early development of self-report measures to ensure that the items have content that is clearly representative of the construct under scrutiny and, importantly, will have meaning to respondents. Clearly, there is an element of subjectivity inherent in this approach and good practice generally dictates that a researcher collects a number of expert ratings and develops his/her self-report measures based on the consensus of the experts. Again, choosing your experts carefully is important here as it is essential they live up to their billing and have a clear idea of the construct under scrutiny.

Another essential consideration is having a good theory or idea behind the measures and ensuring the experts are fully versed in the theory and the purpose of the measure (McDonald, 1997). As we shall see, theoretical underpinning of the research and instrumentation is crucial and a common theme in other research assumptions in sport and exercise psychology such as inference of causality. Expert ratings of face validity may also be useful in re-evaluating the content of items on the basis of preliminary analyses aimed at evaluating other aspects of validity. Interestingly, given the development of sophisticated statistical analyses aimed at testing construct and other aspects of validity in self-report measures, few researchers pay much heed face validity although it is an important first step (Weiss, 1982).

One of the main symptoms of the preoccupation of researchers with other forms of validity and neglecting face validity is the production of instruments that contain self-report measures or items that are effectively the same question, worded differently. This may yield highly acceptable validity and internal consistency statistics in statistical evaluations of other forms of validity, but such instruments may not capture essential elements of a construct and subsequent tests of theory using the measure may not yield effects that are representative of the effect under scrutiny i.e. they lack validity. A good example here is the distinction between affective (emotional) and cognitive (instrumental) attitudes toward sport and exercise (Trafimow & Sheeran, 1998). Research has clearly demonstrated that people are able to make the distinction between these different components of attitude and they have differential effects on intentions to participate in sport and exercise behaviour, and yet many measures tend to neglect or completely omit one or the other aspects of the construct (Conner et al., 2007, Hagger and Chatzisarantis, 2005, Hagger and Chatzisarantis, 2008, Lowe et al., 2002, Rhodes and Courneya, 2003). This limits the validity of findings as they are confined to a narrow conceptualisation of the attitude constructs. Researchers should therefore not assume that their self-report measures sufficiently capture the essence of a construct. They would do well to employ experts to rate the content of candidate self-report measure and its representativeness of the construct in question, provided, of course, the experts are bone fide experts!

Convergent validity is established through the convergence of self-report measures or items that ostensibly make up or construct an unobserved or theoretical psychological variable. This form of validity is often tested using correlation analyses of scores on the pool of self-report measures or items identified in the face validity assessment from a sample of the population of interest. The extent to which the items intercorrelate with each other provides an indication of their convergent validity. If the items are all highly correlated (typically a Pearson's r > .70) then the measures are said to ‘converge’ on the construct of interest. Factor analytic techniques are considered de riguer for establishing convergent validity as they identify key clusters of correlations within a matrix of correlations between the component items of a self-report measure. For example, factor loadings from rotated exploratory factor analysis solutions represent the relative contribution each measures makes to the unobserved mathematical entity that is the proposed construct (Kline, 1994). This permits the researcher to identify whether the items considered a priori to capture the essence of the construct under scrutiny all contribute substantially to the unobserved variable that emerges from the analysis.

In the last 25 years, confirmatory factor analytic techniques have become the preferred state-of-the-art method to evaluate convergent validity because they provide an a priori approach to establishing convergent validity and evaluates the adequacy of a proposed factor structure in explaining the covariances among the measures ostensibly developed to measure the proposed construct (Bagozzi and Yi, 1994, Bentler, 1986, Jöreskog, 1993). The promise of this approach is that it not only adopts a hypothesis-testing framework that can be falsified, but it also explicitly models measurement error associated with the measures producing variables that are ostensibly ‘error free’ (Martin, 1982). This is in keeping with tenets of classical test theory (Kline, 2005). However, both variations of factor analysis alone cannot provide unequivocal evaluation that the items provide a valid measure of the construct. The notion of convergent validity must be considered alongside face validity assessments to confirm that the content of the items reflects the construct under scrutiny.

Concurrent and discriminant types of validity, along with convergent validity, can be viewed as subcomponents of construct validity. Concurrent validity reflects the degree to which measures of a given construct correlate with like measures i.e. similar or alternative measures of the same construct. For example, in a sport and exercise context, this might be the correlation between measures of physical self-concept and global self-esteem or physical self-worth from another psychometric measures (Marsh et al., 2002, Marsh et al., 1994) or measures of attitude toward physical activity (Hagger and Chatzisarantis, 2008, Schutz and Smoll, 1977). However, it is also expected that a self-report measure of a given construct is not associated with constructs that are not theoretically related to that construct. For example, in sport and exercise, it is important that measures of cognitive and somatic anxiety are correlated but with a relatively small effect size (Burton, 1998, Craft et al., 2003, Martens et al., 1990). Such a relationship suggests that there is some conceptual overlap, which is expected given that they are both emotional states, but they are clearly distinct components of that construct. One question that arises is how weak does a correlation have to be in order to confirm discriminant validity and how strong does it have to be in order to support concurrent validity? There is no hard or fast rule, and such associations are subjected to the source of the observations and the underlying theory (Bagozzi, 1981a). Guidelines that have been adopted in the past include the effect size taxonomy offered by Cohen (1988) or methods used to formally establish whether a correlation coefficient is significantly different from unity (Bagozzi and Yi, 1989, Diamantopoulos and Sigauw, 2000).

Predictive validity offers the researcher information as to whether the proposed psychological construct measured by their self-report is a predictor (antecedent) or a dependent (consequent) of other key variables in according to theory. This is similar to nomological validity which refers to whether the construct of interest is part of an established network or pattern of effects proposed by theory. Predictive validity is therefore a ‘special case’ of nomological validity which may contain a series or pattern of predictions or proposed antecedent–consequent relationships. Often, predictive or nomological validity forms part of a set of formal hypotheses within a research study and should be tested after other forms of validity have been established. However, these are often the most interesting and original validity tests because they likely relate to psychological theory. They may also address some of the questions raised by Baumeister et al. (2007) by establishing the role of the self-report measure in changing, explaining or predicting observed behaviour, although behavioural measures are often self-reports themselves. However, many research reports do not explicitly state that they test predictive validity, and, furthermore, in some cases a test of predictive validity may only be equivalent to a test of concurrent validity because the relations are measured at the same point in time and the data is correlational in nature (Rutter, 2007), which opens questions relating to the inference of causality, an issue we will address later.

The forms of validity we have outlined here are not new or groundbreaking, on the contrary, they represent the culmination of more than a century of research in psychometrics, classical test theory, and self-report or introspective methods in psychology (Kline, 2000, Kline, 2005). However, what is unique and of concern, is the lack of attention that researchers pay to these forms of validity and, as a consequence, the assumptions regarding validity that are made when using self-report measures in psychological tests of effects in sport and exercise psychology. This is an instance where assumptions made at the study or micro-level can have profound effects in terms of research quality at the field or macro-level. If studies persistently neglect these forms of validity, their contribution to the field is diminished. Furthermore, if it is common practice for research articles in a particular literature to provide validity tests that are incomplete and inadequate, such practices may become endemic leading to a systematic undermining of the quality of the available knowledge in that literature.

We must, however, point out, that we are not immune to these criticisms nor are the problems confined purely to sport and exercise psychologists. Indeed, we confess that much of our own research has not paid sufficient attention to all these issues relating to validity. Many of our research reports have made assumptions regarding the validity of the measures and methods used therein (e.g., Chatzisarantis and Biddle, 1998, Hagger et al., 2001, Hagger et al., 2001). We should therefore learn from the limitations of our previous work arising from our assumptions of validity and make concerted attempts in our future research endeavours to amend these contentious issues and resolve the effects these assumptions may have had on our inferences. This will only serve to improve the quality of the research and make the tests of the proposed effects based on theory more representative of the true effect in the population (notwithstanding causality and representative issues, as we will argue later!).

It is important to note that the adoption of self-report psychological measures to investigate a proposed effect does not necessarily mean that researchers should go through a lengthy validation process themselves. What we do advocate is some common sense in terms of establishing validity and the subsequent use of such measures. Often researchers adopt the measures developed and validated by other researchers and use them without making a careful evaluation of whether the previous validity tests are appropriate and applicable to the context in which they are applying the measure. In such cases, researchers deem it sufficient to acknowledge the previous validation studies to allay any concerns regarding the validity of the measure. However, such an assumption may be erroneous. For example, unless the previous validation tests were conducted in a similar context and in a sample with similar characteristics to that which is the focus of the researcher's investigation, then it is likely that an assumption of validity cannot be made. In short, the validity tests in sport and exercise psychology research are frequently too far removed from the context of interest to be transferable. We see this often in research in sport and exercise psychology where the trans-contextual translation of the measure at hand is effectively a ‘leap of faith’. For example, researchers often apply measures that have been developed in adults to research involving children or the elderly or apply measures developed in a general education context to a sport and exercise context with nothing more than a subtle rewording of the items of the self-report measure. While such application may have some ‘face’ validity (and even then few researchers corroborate this using expert ratings), we do not know the extent to which such measures can be applied to the context of interest and, as we have found in previous research, instruments frequently do not translate directly across contexts (e.g., Hagger et al., 2007, Hagger et al., 2005). As a consequence, previous tests of validity of self-report measures must be carefully scrutinised by researchers and a value judgement made as to whether the tests of validity were conducted in a sufficiently similar context and sample to generalise to the target context and sample of interest. If not, or there is any doubt, then the researcher should make provision to conduct his/her own tests of validity.

After our advocacy of the need to pay attention to types of validity in sport and exercise research and the fallacies of making such assumptions, it would be remiss of us not to offer some guidelines or solutions (and show that we have attempted to make amends for our previous transgressions by paying due consideration for validity issues in our own research!). Our recent research with self-determination theory involves the effects of perceived autonomy support, a construct that reflects whether significant others provide support for self-determined or intrinsic motivation, on intrinsic motivation, intentions, and physical activity behaviour. To examine the effects of perceived autonomy support, we used a self-report measure developed in a classroom context (Williams & Deci, 1996). We did this systematically from a pool of items with ‘face’ validity and conducted preliminary analyses on a number of samples to develop an exercise-specific instrument called the perceived autonomy support scale for exercise settings (PASSES, Hagger, Chatzisarantis, et al., 2007). The scale achieved convergent validity in a confirmatory factor analysis and discriminant validity from conceptually-related but distinct motivational orientations, including intrinsic motivation (Hagger, Chatzisarantis et al., 2007). We were careful to use a measure of motivational orientations that had been developed for an exercise context (BREQ, Mullan, Markland, & Ingledew, 1997) and had been used in such context with similar target population (young people in school settings).

In other studies we tested the predictive validity of the PASSES as a predictor of self-determined motivation and physical activity intentions (Chatzisarantis, Hagger, & Smith, 2007). In addition, we also tested its nomological validity by testing its role in an elaborated network or motivational sequence in which the effect of perceived autonomy support on physical activity intentions was mediated by self-determined motivation and the proximal antecedents of intention (Hagger et al., 2005, Hagger et al., 2003, Hagger et al., in press). Importantly, these tests of validity were conducted using an a priori approach using latent variables and covariance structure analyses and were systematic across a series of investigations. Moreover, we also worked in a temporal ordering of the hypothesised causal system by measuring the variables at three points in time. As we shall see later, this provides some (but not complete) evidence for the direction of causality among the proposed sequence.

In summary, assumptions of validity of self-report measures can bring the validity of tests of effects involving these measures into question. This may have serious consequences for the quality research at a micro-level and, if such practices a rife, may temper the quality of research at a macro-level because it will compromise the contribution that can be made to knowledge. Given the limitations highlighted and inherent with self-report measures of psychological constructs, researchers need to be diligent in evaluating the validity of the measures they adopt. Should previous tests of validation be conducted in contexts or samples dissimilar to the proposed research, researchers are advised to evaluate the face, convergent, concurrent, discriminant, predictive, and nomological validity of the self-report measure prior to testing hypotheses. If researchers adopt such a rigorous approach to validity and journal editors demand such standards, it would minimise the caveats of limitations caused by methodological assumptions and engender greater confidence in the contribution the work makes to knowledge in sport and exercise psychology.

Sport and exercise scientists know that correlation does not infer causality (James, Mulaik, & Brett, 1982). We realise that we would be preaching to the converted if we were to invoke a discussion on why a significant correlation between two (or more) variables does not mean that one of the variables is causing the other(s). And yet the field is rife with correlational studies and many ‘tests’ of theories and bodies of literature in the field are built upon findings using correlations between variables. Returning to our cursory glance at the two most recent volumes of the leading publications in the field, 53 of 67 articles (79.1%) in PSE and 22 of 43 articles (51.2%) in JSEP adopted designs that were correlational in nature.

One primary reason for the prevalence of correlational designs in the literature is that such research is comparatively easier to conduct than studies adopting designs that permit a better inference of causality. This is not to say correlational studies cannot be informative and have their place in sport and exercise psychology, but ‘pure’ experimental psychologists, as Rutter (2007) describes them, claim that only randomised controlled experiments or interventions can provide a true test of the causal effect of one variable on another. Rutter argues that in conducting correlational studies researchers tend to either couch the reporting of their results in ‘cautious’ language so as to avoid inference of causality, although they often make sweeping claims akin to causal inferences in discussions sections, or acknowledge the limitation of the inability to infer causation using correlational data and state that their primary purpose was to study association not causation. Of course, neither scenario is satisfactory, one implies causation in an indirect sense while the other acknowledges the limitation but mere association lacks the really interesting and important information about causal mechanisms.

It is important, therefore, to advocate that sport and exercise psychologists endeavour to provide more robust tests of implied causative effects using carefully-controlled, randomised experimental methods. Such tests are an essential part of the researcher's armoury and adoption of such designs is important to ensure research quality at the micro- and macro-levels. At the micro-level it will mean greater ability to infer causality in the specific sample under scrutiny. At the macro-level it will mean that the causal effects will be more generalisable to the population and will thus make an impactful contribution to overall knowledge and progress theory regarding the processes and mechanisms in proposed causal relationships.

However, it is equally important to acknowledge that such experiments or interventions should not be considered the ‘holy grail’ of research inquiry on causal effects. Of course, experiments or interventions are also subjected to artifactual random error arising from measurement and sampling inadequacies, regardless of the level of care taken to carefully control the manipulations. More serious is the difficulty in truly inferring causality on the basis of a single, discrete independent variable on a dependent variable (James et al., 1982). Indeed, Rutter (2001, 2007) have suggested that it is seldom that there is any one single “simple direct determinative causal effects on any outcome” (Rutter, 2007, p. 378). Instead, there are usually multiple factors that have multiple causal effects on a given dependent or outcome variable. Furthermore, systems of causation are very seldom simply linear, direct, and proportional, the mechanisms are often complex with meditational constructs or interactions or moderators of the effects or the causal effect may be non-linear such as curvilinear or quadratic effects. Given this level of complexity, researchers advocate that experiments or interventions be part of a systematic approach that adopts multiple research strategies and methods to test the nature of causal effects (Rutter, 2001). Such an approach would provide stronger converging evidence for the true nature of the causation of an outcome or behaviour in sport and exercise psychology.

Researchers who infer causality often justify their inferences of causality on the basis of data that is correlational in nature by claiming that theory dictates that the pattern of effects is so. However, while a pattern of causation is theoretically plausible, falsifying such hypotheses is clearly the purpose of any empirical test and therefore the test should be sufficient to be able to make such a judgement. Therefore it is important that researchers do not assume that causality in correlational data is any more substantiated by identifying a theory that suggests it to be so (McDonald, 1997). Any such inference based on theory and applied to correlational data should be clearly labelled as speculation. Another position often taken is that longitudinal or prospective data provide stronger evidence of the causal nature of a system. Such designs are certainly preferable to cross-sectional correlational designs especially if longitudinal analysis adopts a cross-lagged panel design permitting the researcher to model mutual causation as well as interindividual change or covariance stability over time (Finkel, 1995, Menard, 1991). However, while panel designs may model change and provide evidence for the temporal ordering and unidirectional or mutual effect models of change (Hertzog & Nesselroade, 1987), they do not account for all forms of change and the data are still cross-sectional in nature (Hagger, Chatzisarantis, Biddle, et al., 2001). Nevertheless, such correlational research, cross-sectional and longitudinal, provides some useful evidence as to the links between variables in a proposed causal system.

So correlational data are not useless, they just have their limitations and researchers must not only make it clear from the outset of their research of these limitations but avoid causal inferences that cannot be made and label any causal speculations on the basis of theory as such rather than confining their caveats to a small postscript in a limitations section. Importantly, researchers should follow the recommendations of Rutter (2001) in terms of the utility of different research designs to test a causal nature of a system in sport and exercise. Rutter advocates that the nature of a causal system should be based on a systematic evaluation that uses a “combination of research strategies” (Rutter, 2001, p. 291) to provide converging evidence for the nature of that system.

To illustrate the necessity of using multiple strategies to support a causal relationship, we return to our series of studies examining the effects of perceived autonomy support on intrinsic motivation in exercise behaviour. The studies paid close attention to ensuring that the different forms of validity were attended to when developing the self-report measures of perceived autonomy support (Hagger, Chatzisarantis et al., 2007) and we used cross-sectional (Chatzisarantis, Hagger, & Brickell, 2008) and longitudinal methods (Hagger et al., 2005, Hagger et al., 2003, Hagger et al., in press) to examine the proposed pattern of correlations between the variables, particularly the mediation mechanisms involved. Here it is clear that the proposed causal system is not a simple directional single cause of one variable on an outcome, as previously mentioned (Rutter, 2007), but a network of relations which, in itself, may only be a partial version of the casual chain and the possible influences on the dependent or consequent factors. Of course, these studies used correlational designs, albeit relatively powerful ones using three-wave prospective techniques, and therefore we couched our findings within the limitations of these methods (Hagger et al., in press). In light of the limitations inherent in the correlational designs used previously, we set about testing these effects using experimental methods and designed a field experiment to change perceptions of autonomy support in children by manipulating the autonomy supportive behaviour of their teachers and examining the effects on self-determined motivation and intentions to engage in physical activity (Chatzisarantis & Hagger, 2009). This field experiment provided us with some stronger evidence for the causal effect of autonomy support on self-determined motivation, but also the processes involved in these relationships. Specifically, we tested whether self-determined motivation mediated the effect of the autonomy support manipulation on intentions. This provided essential information about the proposed mechanisms or process by which changes in autonomy support resulted in intentional behaviour.

Our series of studies demonstrate how multiple tests of an effect can provide converging evidence at the macro-level for a hypothesised relationship: the effect of autonomy support on intentional physical activity behaviour. Ideally, this series of studies could have been published simultaneously in a multi-study article. However, as with all research, the picture developed over an extended period of time and such an ideal scenario seldom emerges. However, it is important that researchers are mindful of the assumptions that tend to be made relating to the inference of causality in sport and exercise psychology research. Researchers are reminded of the importance of using multiple methodologies to test the causal nature of an effect. It is also important to acknowledge that any effect is seldom the result of a unitary, single deterministic cause of one psychological variable on an outcome variable. Such effects are more likely to be part of a network of causal pathways with a series of mechanisms or processes such as mediation and moderation involved.

One of the assumptions frequently made by sport and exercise psychology researchers is that a statistically significant test of an effect represents the true nature of that effect in the population. Many researchers have highlighted the problems presented by making assumptions regarding statistical sweeping generalisations such as this (Cohen, 1994, Kirk, 1996, Thompson, 1996, Thompson and Snyder, 1998). Furthermore, these difficulties have also been highlighted by the increased prevalence of meta-analyses and systematic reviews in the sport and exercise psychology literature (Hunter and Schmidt, 1990, Rosenthal and Rubin, 1982). To reiterate, the problems with statistical inference based purely on statistical significance are that significant findings may be biased according to size of the sample in which the effect is tested. The smaller the sample size, the less likely a researcher is to find a statistically significant effect, which may result in an effect that truly exists in the population going undetected. The converse is also the case. A large sample size may very well result in the detection of an effect that is statistically significant, but a large sample tends to render even very small, unsubstantial effects significant when, in fact, their weakness is relatively inconsequential or even insignificant!

As a consequence, research methodologists have called upon researchers and journal editors to be equally demanding in including effect size statistics in psychological research (Thompson, 1999, Vacha-Haase et al., 2000, Wilkinson, 1999). Although such calls have resulted in a general increase in the reporting of effect size statistics in psychology research, including those in sport and exercise psychology, a substantial minority fails to do so (Andersen et al., 2007, Thompson and Snyder, 1998, Vacha-Haase, 2001). Returning to the recent two volumes of PSE and JSEP, of the studies that adopted ANOVA, t-tests, or other standard statistical tests of difference as techniques to analyse their data, we found that 18 of 23 articles (78.3%) in PSE and 16 of 22 articles (72.7%) in JSEP reported effect size statistics.2

Once again, this is where we have to confess to falling short of these standards. As editors, associate editors, and reviewers for peer reviewed scientific journals in sport and exercise psychology, we have also failed to be sufficiently demanding of authors when it comes to reporting effect size statistics. As the previous analysis illustrates, we are not alone. One of the reasons for this may be an ingrained culture oriented about alpha levels, significance testing and the p < .05 probability level accepted as standard throughout social sciences and psychology (Cohen, 1994). As a consequence, journal editors and reviewers, and authors of published research in sport and exercise psychology have not been acceptably proactive in demanding and supplying, respectively, effect sizes in research reports. As Thompson (1999) reports, guidelines such as the APA publication manual (5th Ed.) encouraging researchers to report effect size statistics have not been effective in “changing behaviour” (p. 192). Therefore more effective lobbying is necessary to change journal publication policy to demand the inclusion of effect sizes in research reports, a call which has been heeded by some (Kirk, 1996, Thompson, 1996, Thompson and Snyder, 1998). However, many still fall short of this aim, and the assumption that information on the importance or contribution of an effect found in a research report can be supported through significance testing alone is erroneous and cannot be resolved unless a trend toward reporting effect sizes is seen.

Another problem raised by a failure to report effect sizes in published research in sport and exercise psychology is relative assumptions regarding the importance and meaning of an effect with respect to the overall body of literature in the field. Meta-analyses and systematic reviews have brought such issues to the fore (Chatzisarantis and Stoica, 2009, Cooper, 1990, Glass, 1976, Hagger, 2006, Hagger and Chatzisarantis, in press, Hedges et al., 1989, Hunter and Schmidt, 1990). The promise of meta-analysis is to provide a quantitative synthesis of research findings testing effects across studies and correcting for artifactual variance that may bias study findings. The central metric of a meta-analytic synthesis of research is tests of effect size (Hagger, 2006). As authors of meta-analyses will attest, conducting such analyses can be quite problematic as often studies report insufficient effect size data, or even lack sufficient data to calculate or infer an effect size, which means the study must be excluded from the analysis unless the author can be contacted rendering the ‘universe’ of available studies incomplete (Field, 2003).

Aside from failure to report such data acting to hinder meaningful syntheses of research, meta-analytic theory also highlights the problems associated with relying solely on significance testing. Examining empirical tests of a given effect in the literature may reveal that some tests are statistically significant while others are not. This is likely to lead to the conclusion that the tests of the effect in the literature cannot provide a resolution as to whether or not the effect truly exists in the population (Hunter & Schmidt, 1990). One possibility is that some of the tests were non-significant due to additional variance caused by errors in the conduct of the study, such as sampling error or measurement error (Chatzisarantis & Stoica, 2009). Another possibility is that the variation of the size of the effect is due to conditions or variables that moderate the effect. Again, this speaks to Rutter's (2007) contention that few effects are truly singular and deterministic and are rather part of a complex network of effects. Moderating variables such as demographic, methodological, or psychological variables may be responsible for the variation. Meta-analysis can resolve this by correcting for these sources of error to produce an averaged effect size statistic which, if conducted using sufficient data and the correct analytic technique, will provide an accurate estimate of the true size of the effect in the population and whether moderators exist. Overall, conclusions based solely on an observed distribution of significant effects across studies can often lead a researcher to label the body of literature as ‘inconclusive’ and meta-analysis shows that such a conclusion is often misplaced and may be due to artifactual error that biases the effect size in individual studies (Hagger, 2006). Therefore, the assumption that significance testing can provide unequivocal evidence for the existence of an effect in sport and exercise psychology can be steeped in fallacy and meta-analytic theory illustrates this. Resolution lies in the reporting of effect size statistics, like Cohen's d or η2, rather than relying solely on statistical significance testing (Wilkinson, 1999).

Assumptions that a large effect size equates to an effect that has genuine significance in the real world can also be erroneous. While effect size can provide very useful information of the efficacy of an intervention or manipulation to change an outcome or behaviour in the sport and exercise sciences, it cannot provide information of the extent to which that effect will make changes to the outcomes or behaviours that are meaningful to target groups such as people who want to do more exercise or athletes who desire to improve their sport performance. Kirk (1996) suggests that researchers must therefore provide an evaluation of the meaning of the changes that result from their interventions by commenting on the practical significance of their findings. Indeed, Jacobson and Truax (1991, p. 12) illustrate that large effect sizes may not convey the true practical or clinical significance of an effect and its potential make a difference to people's lives:

“…if a treatment for obesity results in a mean weight loss of 2lb [0.91 kg] and if subjects in a control group average zero weight loss, the effect size could be quite large if variability within the groups were low. Yet the large effect size would not render the results any less trivial from a clinical standpoint. Although large effect sizes are more likely to be clinically significant than small ones, even large effect sizes are not necessarily clinically significant.”

Researchers in the sport and exercise sciences must, therefore, not assume that a statistically significant effect size, however large, will make a contribution to target outcomes or behaviours that is meaningful in a practical or clinical sense. Reasoned interpretations of research findings based on what is important to people in a practical sense is essential if research in sport and exercise psychology is to be a socially-relevant discipline.

In this paper we identified some of the assumptions made by researchers in sport and exercise psychology, the problems associated with making such assumptions, and how they affect research quality. We have noted how such assumptions not only affect research quality at the individual study or micro-level but also have the potential to affect the meaning and contribution research makes to knowledge in the field at the macro-level. We have also attempted to provide some guidelines and recommendations as to how researchers can allay the problems associated with such assumptions. We began with assumptions relating to types of validity, particularly with self-report measures and methods that are adopted by the majority of sport and exercise psychology studies. The many forms of validity were reviewed and we suggested that researchers too often make assumptions regarding the validity of their instruments but fail to conduct sufficient tests to support such assumptions rendering the validity of their hypothesis tests based on such measures open to question. Next we commented on the inference of causality in sport and exercise psychology research. We argued that researchers often rely too heavily on correlational data and assume a causal nature in their tests of effects when using such designs. We suggested that while correlational data has a place in testing effects in sport and exercise psychology, knowledge concerning the causal nature of an effect needs to come from converging evidence delivered through multiple tests of the effect using multiple methodologies including randomised controlled experiments. Finally, we examined the importance of reporting effect size statistics in sport and exercise psychology research. Despite persistent calls from statistical theorists, authors do not always report effect size statistics and journal editors need to demand that such data are included (Kirk, 1996, Thompson, 1999, Wilkinson, 1999). We propose the following guidelines and recommendations:

Authors and researchers should (1) evaluate a priori whether previous validity tests of self-report measures of psychological measures they propose to use have been conducted in samples and contexts suitably similar to those they propose to use and, if not, seek to provide their own tests of face, convergent, concurrent, discriminant, predictive, and nomological validity, (2) view their research from a broad perspective to evaluate its place as a test of a proposed causal relationship, be mindful of making causal inferences where they are not warranted, and, wherever possible, adopt study designs that will assist in inferring causality preferably using multiple methodologies, and (3) report effect size statistics in research and provide clear, unequivocal statements regarding whether statistically significant findings are meaningful and have “practical significance”.

Editors and reviewers should (1) be aware of the types of validity, demand high standards of validity from authors reporting research using self-report measures, and be mindful of researchers making ‘leaps of faith’ when declaring self-report measures developed in different samples or diverse contexts as valid for use in their research, (2) take care to identify the use of causal language by researchers reporting results of correlational research, demand that any such inferences are clearly labelled as speculative and based on theory not data, and advocate the adoption of multi-study papers that use a combination of methods to evaluate causal effects in a variable system, and (3) demand that authors include effect size statistics when reporting research findings and make public that the inclusion of effect size statistics is a requirement through published guidelines for authors and journal policy.

Many readers will read this article and say “I know that” and “I've heard that before” and they will, undoubtedly, be correct. However, the fact remains that assumptions remain rife in sport and exercise psychology research and unless this message is heeded we believe it will hinder the progress of knowledge in our field and the conclusions we can draw from the reported research. We therefore conclude that following these simple guidelines in the design, analysis, and reporting of research findings will raise standards of the research in the field. We firmly believe that sport and exercise psychology can lead the way in dispelling the assumptions that many psychologists make in conducting and reporting research and demonstrate to the field of psychology and the greater social science community that research in this field is of the highest quality.

References (72)

  • R.P. Bagozzi et al.

    The degree of intention formation as a moderator of the attitude–behavior relationship

    Social Psychology Quarterly

    (1989)
  • R.P. Bagozzi et al.

    Advanced topics in structural equation models

  • R.F. Baumeister et al.

    Psychology as the science of self-reports and finger movements: whatever happened to actual behaviour?

    Perspectives on Psychological Science

    (2007)
  • P.M. Bentler

    Structural modeling and psychometrika: a historical perspective on growth and achievements

    Psychometrika

    (1986)
  • S.J.H. Biddle et al.

    Theoretical frameworks in exercise psychology

  • D. Burton

    Measuring competitive state anxiety

  • N.L.D. Chatzisarantis et al.

    Functional significance of psychological variables that are included in the theory of planned behaviour: a self-determination theory approach to the study of attitudes, subjective norms, perceptions of control and intentions

    European Journal of Social Psychology

    (1998)
  • N.L.D. Chatzisarantis et al.

    Effects of an intervention based on self-determination theory on self-reported leisure-time physical activity participation

    Psychology and Health

    (2009)
  • N.L.D. Chatzisarantis et al.

    Influences of perceived autonomy support on physical activity within the theory of planned behavior

    European Journal of Social Psychology

    (2007)
  • J. Cohen

    Statistical power analysis for the behavioral sciences

    (1988)
  • J. Cohen

    The earth is round (p < .05)

    American Psychologist

    (1994)
  • M. Conner et al.

    Conscientiousness and the intention–behavior relationship: predicting exercise behavior

    Journal of Sport and Exercise Psychology

    (2007)
  • H. Cooper

    Meta-analysis and the integrative research review

  • L.L. Craft et al.

    The relationship between the competitive state anxiety inventory-2 and sport performance: a meta-analysis

    Journal of Sport and Exercise Psychology

    (2003)
  • A. Diamantopoulos et al.

    Introducing LISREL

    (2000)
  • A.P. Field

    The problems using fixed-effects models of meta-analysis on real-world data

    Understanding Statistics

    (2003)
  • S.E. Finkel

    Causal analysis with panel data

    (1995)
  • G.V. Glass

    Primary, secondary and meta-analysis of research

    Educational Researcher

    (1976)
  • M.S. Hagger

    Meta-analysis in sport and exercise research: review, recent developments, and recommendations

    European Journal of Sport Science

    (2006)
  • M.S. Hagger et al.

    Cross-cultural validity and measurement invariance of the social physique anxiety scale in five European nations

    Scandinavian Journal of Medicine and Science in Sports

    (2007)
  • M.S. Hagger et al.

    Physical self-perceptions in adolescence: generalizability of a multidimensional, hierarchical model across gender and grade

    Educational and Psychological Measurement

    (2005)
  • M.S. Hagger et al.

    The influence of self-efficacy and past behaviour on the physical activity intentions of young people

    Journal of Sports Sciences

    (2001)
  • M.S. Hagger et al.

    Antecedents of children's physical activity intentions and behaviour: predictive validity and longitudinal effects

    Psychology and Health

    (2001)
  • M.S. Hagger et al.

    First- and higher-order models of attitudes, normative influence, and perceived behavioural control in the theory of planned behaviour

    British Journal of Social Psychology

    (2005)
  • Hagger, M. S., & Chatzisarantis, N. L. D. Integrating the theory of planned behaviour and self-determination theory in...
  • Hagger, M. S., & Chatzisarantis, N. L. D. (2008). Youth attitudes. In A. L., Smith & S. J. H. Biddle (Eds.), Youth...
  • Cited by (90)

    View all citing articles on Scopus
    View full text