Flexibility exercise training for adults with fibromyalgia

Summary of findings for the main comparison. Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia

Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia
Patient or population: adults with fibromyalgia Settings: group and home program Intervention: flexibility exercise training Comparison: aerobic training Outcome: measured at the end of intervention
Outcomes	*Anticipated absolute effects^ (95% CI)**		Relative effect (95% CI)	№ of participants (studies)	Certainty of the evidence (GRADE)	Comments
Outcomes	Risk with aerobic (end of intervention)	Risk with flexibility	Relative effect (95% CI)	№ of participants (studies)	Certainty of the evidence (GRADE)	Comments
Health‐related quality of life assessed with: FIQ Total (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 12 weeks to 20 weeks⁵	Mean health‐related quality of life was 42 mm.	Mean 4.14 mm higher (5.77 lower to 14.05 higher)	‐	193 (2 RCTs)	⊕⊝⊝⊝ VERY LOW^1,2,3,4	Absolute change was 4% worse (6% better to 14% worse). Relative change⁷ in the flexibility groups compared to the aerobic groups was 7.53% worse (10.5% better to 25.5% worse). NNTB n/a⁶
Pain intensity assessed with: VAS (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks⁸	Mean pain intensity was 52 mm.	Mean 4.72 mm higher (1.39 lower to 10.83 higher)	‐	266 (5 RCTs)	⊕⊝⊝⊝ VERY LOW^1,3,4	Absolute change was 5% worse (1% better to 11% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.7% worse (2% better to 15.4% worse).⁷ NNTB n/a⁶
Fatigue assessed with: FIQ and SF‐36 converted (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks⁹	Mean fatigue was 71 mm.	Mean 4.12 mm lower (13.31 lower to 5.06 higher)	‐	75 (2 RCTs)	⊕⊝⊝⊝ VERY LOW^1,4	Absolute change was 4% better (13% better to 5% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.02% better (19.4% better to 7.4% worse).⁷ NNTB n/a⁶
Stiffness assessed with: FIQ (0 is best) 0‐to‐100‐millimeter scale Follow‐up: 8 weeks¹⁰	Mean stiffness was 79 mm.	Mean 29.6 mm lower (51.47 lower to 7.73 lower)	‐	15 (1 RCT)	⊕⊝⊝⊝ VERY LOW^4,11	Absolute change was 30% better (8% better to 51% better). Relative change in the flexibility group compared to the aerobic group was 39% better (10% better to 68% better).⁷ NNTB n/a⁶
Physical function assessed with: FIQ and SF‐36 converted (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks¹²	Mean physical function 17 units.	Mean 6.04 units higher (3.95 lower to 16.03 higher)	‐	60 (1 RCT)	⊕⊝⊝⊝ VERY LOW^1,4	Absolute change was 6% worse (4% better to 16% worse). Relative change in the flexibility group compared to the aerobic group was 13.97% worse (9.1% better to 37.1% worse).⁷ NNTB n/a⁶
Withdrawals All‐cause attrition Follow‐up: 8 to 20 weeks	Study population		RR 0.97 (0.61 to 1.55)	301 (5 RCTs)	‐	Absolute change was 1% fewer withdrawals in the flexibility groups (8% fewer to 21% more). Relative change in the flexibility group was 3% fewer (39% fewer to 55% more).
Withdrawals All‐cause attrition Follow‐up: 8 to 20 weeks	19 per 100	18 per 100 (11 to 29)	RR 0.97 (0.61 to 1.55)	301 (5 RCTs)	‐
Adverse events—increase in symptoms, injuries, or serious adverse events	Studies did not measure or report events.	Not all studies measured or reported events.	‐	No reliable estimate	⊕⊝⊝⊝ VERY LOW^1,4	In 1 of the 5 studies, 1 participant in the flexibility group was reported as having a minor adverse event. The following statement was provided: "a patient in the FLEX group had tendinitis of the Achilles tendon, which responded to treatment with local heat and a reduction in exercise for 14 days" (McCain 1988; page 1138). However, it is unclear whether the tendinitis was related to intervention participation.
*The risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; FIQ: Fibromyalgia Impact Questionnaire; NNTB: number needed to treat for an additional beneficial outcome; NNTH: number needed to treat for an additional harmful outcome; RCT: randomized controlled trial; RR: risk ratio; SF‐36: 36‐item Short Form Health Survey; VAS: visual analogue scale
GRADE Working Group grades of evidence High certainty: We are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low certainty: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹Downgraded two levels due to risk of bias (e.g. selection and performance bias). ²Downgraded one level due to inconsistency (i.e. heterogeneity among trials found). ³Downgraded two levels because flexibility was used as a proxy (i.e. flexibility exercise was used along with relaxation as the control in the study). ⁴Downgraded one level due to imprecision (sample size lower than 400 rule‐of‐thumb). ⁵Study authors: Richards 2002; Valim 2003. ⁶NNTB or NNTH was not calculated, as there were no clinically important between‐group differences. ⁷Relative change calculation as per Cochrane Musculoskeletal Review Group procedures: absolute change divided by the baseline mean of the highest‐weighted aerobic group. Richards 2002 (value was 55 on a 0‐to‐100‐point scale on the FIQ for health‐related quality of life, and 70.4 on a 0‐to‐100‐point scale on the VAS for pain). Valim 2003 (value was 68.4 points on a 0‐to‐100‐point scale on the SF‐36 Vitality for fatigue, and 43.23 on a 0‐to‐100‐point scale on the SF‐36 for function). Bressan 2008 (value was 75.7 points on a 0‐to‐100‐point scale on the FIQ for stiffness). ⁸Study authors: Bressan 2008; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003. ⁹Study authors: Bressan 2008; Valim 2003. ¹⁰Study author: Bressan 2008. ¹¹Downgraded one level for possible selection and performance bias. ¹²Study author: Valim 2003.

Background

Description of the condition

Fibromyalgia syndrome is defined as a condition of generalized, chronic pain lasting at least three months accompanied with widespread muscular tenderness (Wolfe 2016). Individuals with this condition may also experience some degree of decreased energy, fatigue, stiffness, sleep disturbances, depression, memory problems, anxiety, tenderness to touch, balance challenges, and sensitivity to loud noises, bright lights, odors, and cold (Bennett 2014; Macfarlane 2017; Wolfe 2016). Additionally, cognitive impairment, sexual dysfunction, and reduced physical functioning may be experienced (Ghavidel‐Parsa 2015; Zettel‐Watson 2011). These symptoms compromise quality of life, thus impacting home and work environments and possibly leading to a loss of productivity, unemployment, and disability (Ghavidel‐Parsa 2015). Genetic factors may contribute to the development of fibromyalgia through a dysfunctional stress response resulting from the hypothalamo‐pituitary axis following a triggering event (Fitzcharles 2013).

Based on 2012 Canadian diagnostic criteria, available estimates of the prevalence of fibromyalgia in Canada suggest that 2% to 3% of the population experiences the condition, and that it more commonly affects females (Fitzcharles 2013). Other countries have reported similar prevalence rates, using Wolfe 1990 or Wolfe 2010 diagnostic criteria, ranging from 0.4% in Greece and 0.6% in Thailand to 6.4% in the United States and 8.8% in Turkey (Queiroz 2013). Worldwide, the estimated prevalence of fibromyalgia based on previous diagnostic criteria is 2.7%, including 4.1% females and 1.4% males (Queiroz 2013). Following the modified 2010 American College of Rheumatology diagnostic criteria for fibromyalgia, the prevalence of fibromyalgia in the United Kingdom has increased from 1.7% to 5.4% (Jones 2015). With these more recent criteria, the condition is still disproportionately experienced by females, though a greater proportion of males are now being diagnosed with fibromyalgia, as sex or gender ratios have reduced from 13.7:1 to 2.3:1 (Jones 2015). The most recent fibromyalgia criteria, updated in 2016, have identified 96.2% agreement with the 2011 criteria, suggesting that the increased diagnoses rates since the 2011 criteria may continue (Ablin 2017; Wolfe 2016). Fibromyalgia is present among individuals with musculoskeletal disorders, those with other illnesses such iHIV infection or Lyme disease (Buskila 1990; Dinerman 1992), and people with psychological disorders such as depression (MacFarlane 1999). This highlights the diversity of individuals who may experience this condition (Wolfe 2016), as well as the varying comorbidity present.

Many people with fibromyalgia are hesitant to engage in physical activity due to a fear of symptom exacerbation following exercise (Nijs 2013), thus potentially increasing risks of additional comorbidities (Nijs 2013). Individuals with fibromyalgia often experience comorbid illnesses, including musculoskeletal conditions, cardiovascular disorders, endocrinological disorders, spondylosis/intervertebral disc disorders and other back problems, irritable bowel syndrome, interstitial cystitis/painful bladder syndrome, chronic pelvic pain, temporomandibular joint disorder, depression, anxiety, and other psychiatric disorders (Ghavidel‐Parsa 2015).

Fibromyalgia care and comorbidities require significant healthcare resources and costs (Ghavidel‐Parsa 2015). Healthcare costs include healthcare visits and hospitalizations, pharmaceuticals, and extensive diagnostic testing (Ghavidel‐Parsa 2015). On average, individuals with fibromyalgia make 10 to 18 primary care appointments per year and are hospitalized every 3 years (Ghavidel‐Parsa 2015). Several pharmacotherapy treatments have shown tier 2 evidence for moderate pain relief (Macfarlane 2017). Cochrane Reviews of these therapies have included pregabalin and gabapentin (antiepileptics) (Derry 2016 Macfarlane 2017; Roskell 2011; Wiffen 2013), cyclobenzaprine (a muscle relaxant) (Macfarlane 2017; Tofferi 2004), duloxetine, milnacipran, and fluoxetine (serotonin and norepinephrine reuptake inhibitors) (Hauser 2012; Hauser 2013; Macfarlane 2017; Ormseth 2010; Roskell 2011), tramadol (an opioid pain medication and serotonin and norepinephrine reuptake inhibitor) (Macfarlane 2017; Roskell 2011), and amitriptyline (a tricyclic antidepressant) (Hauser 2012; Macfarlane 2017; Moore 2012) and the evidence has been of moderate and high certainty. Non‐pharmacologic treatments of fibromyalgia have recently been recommended (Fitzcharles 2013; Macfarlane 2017). Cochrane Reviews of non‐pharmacologic treatments have identified moderate‐certainty evidence for fibromyalgia management including aerobic exercise (Bidonde 2017; Busch 2007). Additional reviews have identified low‐certainty evidence for aquatic exercise (Bidonde 2014), resistance exercise (Busch 2013), cognitive behavioral therapy (Bernardy 2013), acupuncture (Deare 2013), and mind‐body therapy (Theadom 2015).

Exercise training is now recognized as the cornerstone of treatment and management strategies for fibromyalgia as it represents the strongest evidence available (Fitzcharles 2013; Macfarlane 2017). Non‐pharmacological treatments, especially exercise training, are recommended as the first treatment option for fibromyalgia (Macfarlane 2017). Fibromyalgia treatment recommendations include individualized exercise training tailored to a person's physical abilities and level of conditioning in exercises enjoyed or preferred by the individual (Fitzcharles 2013; Nijs 2013).

Description of the intervention

Flexibility exercise training is a type of exercise that focuses on improving or maintaining the range of motion in muscles and joint structures by holding or stretching the body in specific positions (ACSM 2013). Joint range of motion is an important physical characteristic that influences the capacity to perform activities of daily living (Mulholland 2001). Muscle stretching exercises increase the length of the muscle (or muscle group) beyond what would customarily be used in normal activity. This can improve non‐clinical populations' range of motion temporarily right after flexibility exercises, as well as chronically after approximately three to four weeks of regular stretching at a frequency of at least two to three times a week (de Weijer 2003; Decoster 2005; Guissard 2006; Kokkonen 2007; Radford 2006; Reid 2004). Range of motion may improve in as few as 10 sessions with an intensive program (Guissard 2004).

Different types of stretching exercises can improve range of motion. Ballistic methods use the momentum of the moving body segment to produce the stretch. This is commonly used as warm‐up (Woolstenhulme 2006). Dynamic or slow movement stretching involves a gradual transition from one body position to another, with a progressive increase in reach and range of motion as the movement is repeated several times (McMillian 2006). Static stretching involves slowly stretching a muscle‐tendon group and holding the position for a period (i.e. 10 s to 30 s for young people and 30 s to 60 s for older people) Decoster 2005; Feland 2001). Static stretching can be active or passive (Winters 2004). Active static stretching involves holding the stretched position using the strength of the agonist muscle. In passive static stretching, a position is assumed while holding a limb or other part of the body with or without the assistance of a partner or device. Static stretching, holding at the point of tightness or slight discomfort, is the most commonly used stretching mode (Kay 2015). Proprioceptive neuromuscular facilitation (PNF) methods take several forms but typically involve an isometric contraction of the selected muscle–tendon group followed by a static stretching of the same group and requires partner assistance (Rees 2007; Sharman 2006). Proprioceptive neuromuscular facilitation regularly produces greater increases in range of motion, however it can be problematic, as performing these contractions can be painful and induce muscle damage (Kay 2015).

Low levels of flexibility have been associated with postural problems, pain, injuries, decreased local vascularization, and increased neuromuscular tensions (Coelho 2008). In fact, flexibility training programs have been used to improve a person's well‐being and as a tool for symptom management in different clinical populations such as those with major depressive disorders (Ambrose 2015; Costa 2009; Jones 2006; Lanuez 2011).

How the intervention might work

The main goal of flexibility training is usually to improve or maintain range of motion in major muscle–tendon groups in accordance with individualized goals (ACSM 2013; Garber 2011). Flexibility training improves postural stability and balance, Costa 2009, and enhances physical function, range of motion, Jones 2002; Valencia 2009, and muscle strength, Jones 2006. Flexibility training also decreases such fibromyalgia symptoms as pain, (Valencia 2009), muscle stiffness (Chen 2011), fatigue, and psychological factors (anxiety and depression) (Ambrose 2015; Lanuez 2011; Valencia 2009). It may be speculated that improved flexibility training could also enhance self‐perceived ability to perform activities of daily living, and thereby improve psychosocial factors such as depressive symptoms, Soriano‐Maldonado 2016, and social interaction, which are related to mental health and mood (Peluso 2005). Flexibility training may thus be beneficial for both fitness improvements and symptom control. Since stiffness and reduced range of motion have been shown to reduce health‐related quality of life (HRQoL) in individuals with fibromyalgia (Valencia 2009), flexibility training may contribute to decreasing these physical difficulties thus improving HRQoL.

Flexibility training may be implemented as a program of static stretches that are held for 10 s to 30 s (ACSM 2013). Such activity may be used as part of relaxation programs that have demonstrated a positive effect on physical functioning and pain (Theadom 2015).

Why it is important to do this review

Flexibility exercises are advocated for the general public as a method to address stiffness and increase or maintain range of motion of major joints of the body (such as shoulders, hips, knees, ankles, back, neck) in order to maintain or improve general physical function (ACSM 2013). Since incorporating exercise into one's daily routine is not a small endeavour, it is the responsibility of clinicians and researchers to identify whether flexibility training should be undertaken both to improve and maintain physical function and to improve symptoms of fibromyalgia. If this form of exercise contributes to symptom improvement, it is important to identify which symptoms are most affected and the magnitude of the improvement. This review is important because flexibility training exercise is commonly recommended by consumer organizations designed to provide peer support (such as the National Fibromyalgia Association (www.fmaware.org/)). These organizations include individuals with fibromyalgia and healthcare providers, policymakers, and researchers (such as the National Fibromyalgia and Chronic Pain Association (https://fibroandpain.org/). This review was important to examine whether flexibility training does or does not have an effect on symptoms of fibromyalgia and HRQoL. Definitions for some of the terms utilized in this review can be found in the "Glossary of terms" (Appendix 1).

Objectives

To evaluate the benefits and harms of flexibility exercise training interventions for adults with fibromyalgia.

To assess the following specific comparisons:

- Flexibility versus untreated controls (e.g. usual medical treatment)
- Flexibility versus aerobic interventions (e.g. treadmill walking)
- Flexibility versus resistance training (e.g. progressive training using weight machines)
- Flexibility versus other interventions (e.g. Pilates, friction massage, medication)

Methods

Criteria for considering studies for this review

Types of studies

We included trials described as randomized, even if the methods of generating the random sequence were unclear or unreported, or the method of allocating participants was likely to be quasi‐random (e.g. by alternation, date of birth, or similar pseudo‐randomized method). We did not include studies using cross‐over or cluster‐randomized designs. We set no restriction on the number of participants included in the studies.

Types of participants

We included studies that examined adults with fibromyalgia (≥ 18 years of age). We selected studies that used published criteria for the diagnosis (or classification) of fibromyalgia. The American College of Rheumatology (ACR) 1990 criteria have long been used as the standard for classifying individuals as having fibromyalgia (Wolfe 1990). By this method, an individual is classified as having fibromyalgia when they have experienced widespread pain lasting longer than three months with at least 11 active tender points (TP). Tender points are noted at 18 designated locations on the body and are defined as active if pain can be elicited by applying 4‐kilogram tactile pressure.

A diagnostic tool, ACR 2010 (Wolfe 2010), which does not rely upon a physical tender point examination, is also available both as a clinician‐administered questionnaire and as a survey questionnaire (Wolfe 2011). This measure includes the Widespread Pain Index (19 areas representing anterior and posterior axis and limbs), in addition to a Symptom Severity Scale that contains items related to secondary symptoms such as fatigue, sleep disturbances, cognition, and somatic complaints. Scores on both measures are used to determine whether a person qualifies for a “case definition” of fibromyalgia. This tool has been found to correctly classify 88% of cases that meet ACR 1990 criteria, and it allows ongoing monitoring of symptom change among individuals with a current or previous fibromyalgia diagnosis (Wolfe 2010). Although measures focusing on tender point counts have been widely applied in clinical and research settings, the methods described by Wolfe 2010 and Wolfe 2011 seem to classify people with fibromyalgia more efficiently, while allowing improved monitoring of disease status over time.

We also included studies where participants were diagnosed with fibromyalgia under different published diagnostic criteria, such as those by Smythe 1979 and Yunus 1981. Although some differences between published fibromyalgia diagnostic or classification criteria are known, for the purposes of this review, we considered all criteria to be acceptable and comparable.

Types of interventions

We examined trials that studied flexibility exercise training interventions regardless of frequency, duration, or intensity. We defined flexibility as movements of a joint or a series of joints through the complete range of motion that targeted major muscle‐tendon units (ACSM 2013).

We have presented data on interventions using the Frequency, Intensity, Time, Type, Volume, Pattern and Progression (FITT‐VP) principles of exercise prescription (Table 1) outlined for healthy individuals in Appendix 2 (ACSM 2013).

Table 1. FITT‐VP parameters

Author, year, intervention	Frequency, times per week	Length in weeks	Intensity	Time/duration	Session, minutes	Type/mode	Pattern
Flexibility versus control
Assumpção 2017	2 times/week	12 weeks	Stretch intensity was increased gradually to the point of moderate discomfort.	30 s	40 min	Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis)	Not mentioned
Flexibility versus aerobic
Bressan 2008	1 time/week	8 weeks	Not mentioned	30 s	40 to 45 min	Static muscular stretching of the triceps surae, ischiotibial, gluteal, paravertebral, latissimocondyloideus, pectoral, trapezius, and respiratory muscles. Stretching was performed in dorsal decubitus or sitting.	Performed in a series of 5 repetitions
Matsutani 2012	1 time/week	8 weeks	Not mentioned	30 s	45 min	All exercises emphasized breathing and postural alignment corrections.	For each exercise there were 4 replications, holding the stretch for 30 s on each repetition, followed by 30 s of rest.
McCain 1988	3 times/week	20 weeks	Not mentioned	Not mentioned	60 min	Exercise consisted of flexibility maneuvers such that sustained heart rate responses greater than 115 beats per minute.	Not mentioned
Richards 2002	2 times/week	12 weeks	Not mentioned	Not mentioned	60 min	Relaxation and flexibility comprised upper and lower limb stretches and relaxation techniques based on the published regimen by Ost 1987.	Not mentioned
Valim 2003	3 times/week	20 weeks	Not mentioned	30 s	45 min	Stretching program included 17 exercises using both muscles and joints in a general way, including face, cervical, trunk, and extremities.	Not mentioned
Flexibility versus resistance
Assumpção 2017	2 times/week	12 weeks	Stretch intensity was increased gradually to the point of moderate discomfort.	30 s	40 min	Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis).	Not mentioned
Gavi 2014	2 times/week	16 weeks	Not mentioned	30 s	45 min	Stretching program included major muscle groups. Authors reference the stretching protocol used by Valim 2003.	Not mentioned
Jones 2002	2 times/week	12 weeks	Not mentioned	60 s	60 min	Stretching program included stretches performed in standing, sitting, or lying positions.	Not mentioned
Flexibility versus other
Altan 2009	3 times/week	12 weeks	Not mentioned	6 s	60 min	Non‐weight bearing stretching of cervical, shoulder, thoracic, lumbar, gluteal leg and crusis muscle	Not mentioned
Amanollahi 2013	3 times/week	4 weeks	Not mentioned	30 s	Not mentioned	Non‐weight bearing stretching of shoulders blade musculature, paraspinal muscles, neck and low back muscle, hamstrings and calf muscles	Each time included 3 repetitions of each stretching exercise
Calandre 2009	3 times/week	6 weeks	Not mentioned	Not mentioned	60 min	Stretching exercises were performed on muscles over the main body area: cervical, upper and lower groups extremities, and trunk.	Not mentioned
López‐Rodríguez 2012	2 times/week	12 weeks	Not mentioned	Not mentioned	60 min	Flexibility stretching exercises that included global stretches and specific to different muscular areas of the body	Not mentioned

Comparator interventions included land‐based aerobic training (e.g. treadmill walking), resistance training (e.g. progressive training using weight machines), and other interventions (e.g. Pilates, friction massage, Tai Chi, medication, aquatic biodanza). It should be noted that most aerobic and strength training interventions included brief (typically 5 to 10 minutes) warm‐up and cool‐down exercises before and after the main exercise component. These warm‐up and cool‐down components usually included a mix of stretching exercise and light aerobic exercise.

The main comparisons assessed in this review included the following.

Flexibility exercise training versus untreated control
Flexibility exercise training versus land‐based aerobic exercise
Flexbility exercise training versus resistance training
Flexibility exercise training versus other interventions

For the purposes of this review, we were interested in interventions in which the effects of flexibility exercise training could be isolated, therefore we excluded studies that combined flexibility exercise training with other interventions or education.

Types of outcome measures

Major outcomes

Seven outcomes were designated as major outcomes: HRQoL, pain intensity, fatigue, stiffness, physical function, adverse events, and number of participants who withdrew or dropped out. Three outcomes were designated as minor outcomes: tenderness, depression, and greater than 30% improvement in pain. In selecting these outcomes, we sought the opinion of consumers involved in the team and considered the consensus statement of Choy 2009a regarding a core set of outcome measures for clinical trials in fibromyalgia as anticipated effects of flexibility exercise training on physical fitness. We extracted data for selected outcomes at baseline, end of intervention (post‐treatment), and follow‐up data. Review criteria required each included study to report measurement of one or more outcomes for at least one of these time periods.

When an included study used more than one instrument to measure a particular outcome, we applied the following preferred hierarchy to choose the outcome measure for analysis.

Health‐related of life. This outcome consists of multidimensional indices used to measure general health status or HRQoL, or both (Choy 2009a). When included studies used more than one instrument to measure HRQoL, we preferentially extracted data from the Fibromyalgia Impact Questionnaire (FIQ Total; Bennett 2009; Burckhardt 1991), followed by the Short Form Health Survey questionnaire (either the SF‐36 total or the SF‐12 total; Busija 2011; Ware 1993) and the EuroQol‐5D (EQ‐5D) (Wolfe 1997).
Pain intensity. The International Association for the Study of Pain defined pain as “an unpleasant sensory and emotional experience associated with actual or potential tissue damage, or described in terms of such damage” (Merskey 1994). For the purposes of this review, we focused on one aspect of the pain experience, i.e. pain intensity. When more than one measure of pain intensity was reported in a single study, we preferentially extracted measures of average pain intensity (as opposed to worst, least, or current pain) assessed by visual analogue scale (VAS), FIQ Pain, McGill pain VAS followed by the Numerical Pain Rating Scale. In studies where unidimensional measures of pain intensity were not reported, we extracted composite measures that included pain intensity and interference (SF‐36 or Rand 36 Bodily Pain Scale; Ware 1993) or pain intensity and suffering from pain (Multidimensional Pain Inventory ‐ Pain Severity scale).
Fatigue. Fatigue is recognized by individuals with fibromyalgia and clinicians alike as an important symptom (Choy 2009a). Fatigue can be measured in a global manner, in which an individual rates fatigue on a single‐item scale, or using a multidimensional tool that breaks the experience of fatigue down into two or more dimensions such as general fatigue, physical fatigue, mental fatigue, reduced motivation, reduced activity, and degree of interference with activities of daily living (Boomershine 2012). We accepted both uni‐ and multidimensional measures for this outcome. When included studies used more than one instrument to measure fatigue, we preferentially extracted the fatigue VAS (FIQ Fatigue, or single‐item fatigue VAS), followed by the SF‐36 or Rand 36 Vitality subscale, the Chalder Fatigue Scale (total), the Fatigue Severity Scale, and the Multidimensional Fatigue Inventory.
Stiffness. In focus groups conducted by Arnold 2008, individuals with fibromyalgia "... remarked that their muscles were constantly tense. Participants alternately described feeling as if their muscles were ‘lead jelly’ or ‘lead Jell‐O,' and this resulted in a general inability to move with ease and a feeling of stiffness." We used a common measure of stiffness encountered in this literature, i.e. the FIQ stiffness subscale.
Physical function. This outcome focuses on the basic actions and complex activities considered “essential for maintaining independence, and those considered discretionary that are not required for independent living, but may have an impact on quality of life” (Painter 1999). Since cardiorespiratory fitness, neuromuscular attributes (e.g. muscular strength, endurance, and power), and muscle and joint flexibility are important determinants of physical function, this outcome is highly relevant as an outcome of exercise interventions. When more than one measure of physical function was available within a study, we preferentially extracted data for the FIQ physical impairment scale (Burckhardt 1991), followed by the Health Assessment Questionnaire disability scale (HAQ), the SF‐36 or Rand 36 Physical Function, the Sickness Impact Profile – Physical Disability (Bergner 1981), and the Multidimensional Pain Inventory household chores scale (Huskisson 1976; Huskisson 1983).
Adverse events. We extracted the number of participants who experienced adverse events during the intervention (i.e. injuries, exacerbations of pain and/or other fibromyalgia symptoms). If this information was not available, we extracted the nature of the adverse events in a narrative report.
Withdrawals. We reported the number of participants who withdrew or dropped out of the study for any reason.

Minor outcomes

The following is a rationale and preference listing of minor outcomes. Among the three outcomes designated as minor outcomes, we have included one psychological and one physical variable that could potentially improve with flexibility exercise training.

Depression. This is a common mental disorder characterized by depressed mood, loss of interest or pleasure, feelings of guilt or low self‐worth, disturbed sleep or appetite, low energy, and poor concentration. These problems can become chronic or recurrent and lead to substantial impairments in a person’s ability to attend to his or her everyday responsibilities (WHO 2017). In focus groups conducted by Arnold 2008, the emotional disturbances most commonly experienced by participants with fibromyalgia included depression and anxiety. A complete understanding of depression and how best to assess it in fibromyalgia trials is still uncertain and is an active research issue (Mease 2009). However, because people with significant depression are commonly excluded from fibromyalgia intervention studies, the discriminatory power of these instruments is underestimated (Choy 2009b). We preferentially extracted Beck Depression Inventory (BDI) Cognitive/Affective subscale scores followed by BDI total, BDI without fibromyalgia symptoms; Short Form translated SF‐36; Hamilton Depression Scale; Center for Epidemiologic Studies Depression Scale (CES‐D); Fibromyalgia Impact Questionnaire (FIQ) FIQ translated‐ depression subscale; Mental Health Inventory (MHI) depression subscale; Arthritis Impact Measurement scales (AIMS) ‐ depression subscale; Hospital Anxiety and Depression Scale ‐ depression (HADS); Symptom Checklist 90 (SCL‐90‐R) ‐ depression; and the Psychological General Well‐Being (PGWB depression score).
Tenderness. Tenderness was defined as discomfort produced as an evoked response to mechanical pressure (Dadabhoy 2008; Gracely 2003). Although there are concerns that measures of tenderness can be biased by cognitive and emotional aspects of pain perception, many studies have supported the utility of measurement of tenderness in fibromyalgia using either TP counts or pain pressure threshold (Dadabhoy 2008). A TP is identified when pressure of 4 kg is perceived as painful. When included studies used more than one instrument to measure tenderness, we preferentially extracted the TP count followed by pain pressure threshold (dolorimetry score, based on at least six of the 18 ACR TPs) and the total myalgic score (sum/mean of ordinal rating of response to thumb pressure across 18 TPs).
Improvement greater than 30% in pain. A 30% reduction is considered a benchmark for a moderately important change in pain intensity, and the consensus group Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) recommends this measure for interpreting clinical trial efficacy (Dworkin 2008). We extracted data on the number of participants who met this criterion for intervention efficacy when this information was available.

Search methods for identification of studies

The team Information Specialist conducted a comprehensive search in nine databases for studies of physical activity interventions in adults with fibromyalgia. The citations found in the electronic and manual searches were screened and then classified by the type of exercise training. This comprehensive search captured all types of physical activity intervention studies, of which only the subset classified as studies of flexibility training interventions was included in this review.

Electronic searches

We searched the following databases from database inception to 31st of December, 2017 using the methods outlined in Chapter 6 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). We applied no language restrictions. The full search strategies for each database are shown in the Appendices as indicated below.

MEDLINE (Ovid) MEDLINE In‐Process and MEDLINE 1946 to 31st of December 2017 (Appendix 3)
Embase (Ovid) Embase Classic + Embase 1947 to 31st of December 2017 (Appendix 4)
CINAHL (EBSCO) (Cumulative Index to Nursing and Allied Health Literature) 1982 to 31st of December 2017 (Appendix 5)
Cochrane Library (Wiley) 2003, Issue 1 to present (Appendix 6)
- Cochrane Database of Systematic Reviews (Cochrane Reviews)
- Database of Abstracts of Reviews of Effects (DARE)
- Cochrane Central Register of Controlled Trials (CENTRAL)
- Health Technology Assessment Database (HTA)
- NHS Economic Evaluation Database (EED)

AMED (Ovid) (Allied and Complementary Medicine Database) 1985 to 31st of December 2017 (Appendix 7)
Thesis and Dissertation Abstracts (ProQuest) 1743 to December 2017 (Appendix 8)
PEDro (Physiotherapy Evidence Database) 1929 to December 2017 (Appendix 9)
US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov/) 2000 to 31st of December 2017 (Appendix 10)
World Health Organization International Clinical Trials Registry Platform (WHO ICTRP) (www.who.int/ictrp/en/) 2007 to 31st of December 2017 (Appendix 11)

Searching other resources

Two review authors independently reviewed reference lists from key journals; identified articles and reviews of all types of treatment for fibromyalgia; scrutinized all promising or potential references; and added appropriate titles to the search results.

Data collection and analysis

Review authors

The review authors are members of the Cochrane Musculoskeletal Group (CMSG) ‐ Physical Activity and Fibromyalgia Team (see Acknowledgements). The review authors were trained in data extraction and 'Risk of bias' assessment using a standardized orientation program. They worked independently and in pairs with at least one physical therapist in each pair to extract data. Two additional members, our team consumers, assisted at several stages of the review. They were involved in selecting the outcomes, writing the Plain language summary, and reading the final draft for content and readability. The entire team met regularly to discuss progress, clarify procedures, make decisions regarding inclusion or exclusion of studies and classification of outcome variables, and work collaboratively in the production of this review.

Selection of studies

Two review authors independently examined the titles and abstracts of studies generated from the searches using a set of criteria (Appendix 12). The team used Covidence software to assist with independent screening of literature (Covidence 2015). We retrieved the full‐text publications for all potentially relevant abstracts. All non‐English reports were translated (Amanollahi 2013; López‐Rodríguez 2012; Matsutani 2012). We then examined the full‐text reports to determine study eligibility based on the selection criteria. Disagreements between the two review authors and questions regarding interpretation of inclusion criteria were resolved by discussion or by consulting a third review author if needed.

In keeping with Rosenthal's recommendations (Rosenthal 1995), publications referring to the same primary study (what we called 'companions') but presenting follow‐up data in consequent publications were linked and presented as one. Likewise, published studies for which protocols were found in trial registries or were published were considered companions and presented as one.

Data extraction and management

We used electronic data extraction forms developed, piloted, and refined in our previous reviews to facilitate independent data extraction and consensus (Busch 2008). Pairs of review authors independently extracted the data. Any disagreements were resolved by consensus or involving a third person (AJB) if necessary. Two review authors (SYK, AJB) transferred data into the Review Manager 5 software file (RevMan 2014). We double‐checked that data were entered correctly by comparing the data presented in the software with the study reports. We noted in the Characteristics of included studies table whether outcome data were not reported in a usable way (Assumpção 2017); instances when the data were obtained directly from study authors (Altan 2009; Assumpção 2017; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Richards 2002); and when data were transformed or estimated from a graph (Calandre 2009). If both unadjusted and adjusted values for the same outcome were reported, we extracted the adjusted values. If the data were analyzed based on an intention‐to‐treat (ITT) sample and another sample (e.g. per‐protocol, as‐treated), we extracted the ITT data. Due to changes in the methods (e.g. risk of bias), we reassessed studies included in the previous review, (Busch 2002; Busch 2007), for this updated review.

We extracted the following data from the included studies.

Methods: study design, total duration of study and follow‐up (if applicable), and date of study.
Participants: N, n, mean age, age range, gender ratio, disease duration, diagnostic criteria, inclusion and exclusion criteria.
Interventions, comparison, concomitant treatments recording:
- for all interventions with an exercise component: frequency, duration of exercise sessions, intensity, mode, and congruence with American College of Sports Medicine (ACSM) guidelines for healthy adults (ACSM 2013);
- for interventions with a non‐exercise component: frequency, duration, and main characteristics.
Outcomes: major and minor outcomes as indicated previously; additional outcomes assessed (recorded in the Characteristics of included studies table); means and standard deviations for tests at baseline and end of intervention (post‐treatment) and follow‐up for continuous outcomes.
Characteristics of trial design as outlined in the Assessment of risk of bias in included studies section.
Country of study, language of article, records of author contacts, trials registry record or protocol, and notable declarations of interest (recorded in the Characteristics of included studies table).

Assessment of risk of bias in included studies

We assessed risk of bias of studies based on the procedures recommended in the Cochrane Handbook for Systematic Reviews of Interventions. Two review authors independently evaluated the risk of bias in each included study using a customized form based on the Cochrane 'Risk of bias' tool (Higgins 2011). The tool addresses seven specific domains: sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective outcome reporting (including publication bias), and other sources of bias. For other sources of bias, we considered issues such as baseline inequities despite randomization.

We assessed each criterion as low, high, or unclear risk of bias according to the information provided in the studies and at times based on study author responses (Altan 2009; Assumpção 2017; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Richards 2002). We classified studies as having a low risk of bias if all key domains had low risk of bias and no serious flaws. We judged studies for which the absence of information or ambiguities prevented a determination of the potential for bias as at unclear risk of bias. In such cases, we revised our assessment if the authors responded to our requests for more information. Any disagreements between the review authors were resolved through discussion at consensus meetings. If agreement could not be reached, involvement of a third review member was sought.

Measures of treatment effect

For continuous data, we used the group post‐treatment means and standard deviations to calculate the effect sizes, employing Review Manager 5 software (RevMan 2014). We expressed effect sizes preferentially in the form of mean differences (MD) and 95% confidence intervals (95% CI). For dichotomous data, we used risk ratios (RR) and 95% CI.

We used Review Manager 5 software to generate forest plots to display the results (RevMan 2014). We used data from the latest follow‐up assessments when evaluating long‐term effects.

In the comments column of the summary of findings Table for the main comparison, we provided the relative change and the number needed to treat for an additional beneficial outcome (NNTB). The NNTB was provided only when the outcome showed a clinically important difference. We calculated the NNTB for continuous measures using the Wells calculator (available at the CMSG Editorial office). For dichotomous outcomes, such as adverse events, we planned to calculate the NNTB from the untreated control group event rate and the risk ratio using the Visual Rx NNTB calculator. Data were not available, and we were unable to calculate the NNTB for dichotomous outcomes.

In accordance with the Philadelphia Panel 2001, we assumed a minimal clinically important difference (MCID) of 15 points on a 100‐point continuous pain scale and a relative difference of 15% on all functional scales as being clinically relevant. The MCID was used in the calculation of NNTB for continuous outcomes. For dichotomous outcomes, the absolute risk difference was calculated using the risk difference statistic in Review Manager 5, with the result expressed as a percentage (RevMan 2014). For continuous outcomes, the absolute benefit was calculated as the improvement in the intervention group minus the improvement in the untreated control group, in the original units. Relative change calculation as per CMSG procedures: absolute change divided by the baseline mean (of the most weighted study) of the comparator groups.

Unit of analysis issues

We included studies with two or more parallel groups and examined any relevant comparison that allowed the evaluation of the effects of flexibility exercise training interventions on individuals with fibromyalgia. For example, a three‐arm trial comparing flexibility versus drug treatment versus friction massage could appear in two separate analyses: flexibility versus medications, and flexibility versus friction massage. For details see the Characteristics of included studies table.

Dealing with missing data

When numerical data were missing, we contacted the author requesting the additional data required for analysis. We used open‐ended questions to obtain the information needed to assess risk of bias and for the treatment effect. When numerical data were available only in graphic form, we used Engauge Digitizer version 5.1 to extrapolate means and standard deviations by digitalizing data points on the graphs (Mitchell 2012).

For dichotomous outcomes (e.g. number of withdrawals), we calculated the withdrawal rate using the number of participants randomized in the group as the denominator. For continuous outcomes (e.g. post‐treatment in pain score), we calculated the MD or standardized mean difference (SMD) based on the number of individuals analyzed at that time point. When the number of individuals analyzed was not presented for each time point, we used the number of individuals randomized to each group at baseline. When means were not reported, we used medians.

When post‐treatment standard deviations were unavailable, we used the standard deviations of the pre‐test scores as estimates. When the variance was expressed using statistics other than standard deviation (e.g. standard error, confidence interval, P value), we computed standard deviations according to the methods recommended in Chapter 7 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). When we were unable to derive missing standard deviations using the above methods, we would impute them from other studies in the meta‐analysis; however, this was not necessary for this review.

Assessment of heterogeneity

We assessed clinical and methodological diversity in terms of participants, interventions, outcomes, and study characteristics for the included studies to determine whether a meta‐analysis was appropriate. We did this by reviewing data obtained from data extraction tables. We assessed heterogeneity through visual inspection of the forest plot to assess for obvious differences in result between the studies, and through the use of I² and Chi² statistical tests. As recommended in the Cochrane Handbook for Systematic Reviews of Interventions (Deeks 2017), we interpreted I² values as follows:

0% to 40%: might not be important;
30% to 60%: moderate heterogeneity;
50% to 90%: substantial heterogeneity;
75% to 100%: considerable heterogeneity.

We interpreted the Chi² test with a P value ≤ 0.10 as indicating statistical heterogeneity.

When we removed a trial from the analysis, we noted changes in both heterogeneity and effect size. Because I² involves overlapping categories (e.g. 0% to 40%, 30% to 60%), or 'ambiguous' zones, we explored statistical heterogeneity thoroughly when noted (e.g. I² between 50% and 60%). Given that values between 50% and 60% fall into an ambiguous zone, if we could find no apparent causes of heterogeneity, we kept the trial in the analysis and documented our decision.

Assessment of reporting biases

We planned to draw contour‐enhanced funnel plots for each meta‐analysis to assess publication reporting bias if a large enough sample of studies (i.e. more than 10 studies) was available and included in the meta‐analysis (Sterne 2017).

If the randomized controlled trial (RCT) protocol was available, we compared the outcomes in the RCT protocol versus the outcomes in the published report. For studies published after 1 July 2005, we searched the WHO ICTRP and ClinicalTrials.gov for the RCT protocol.

We compared the fixed‐effect estimate against the random‐effects model to assess the possible presence of small‐sample bias (i.e. by which intervention effect is more beneficial in smaller studies) in the published literature. In the presence of small‐sample bias, the random‐effects estimate of the intervention is more beneficial than the fixed‐effect estimate (Sterne 2017).

Data synthesis

When two or more studies reported the same outcome and interventions were deemed sufficiently homogeneous, we pooled the data (meta‐analysis) using Review Manager 5 (RevMan 2014). Before pooling data, we ensured the directionality of the data that permitted pooling; we arithmetically reversed selected scales as needed so higher values consistently had the same meaning. We ensured that scaling factors were consistent to permit calculations of MD (e.g. 10‐centimeter scales expressed in millimeters to match 100‐millimeter scales). We presented results grouped by common comparator, for example flexibility versus aerobics, flexibility versus resistance training, and flexibility versus other comparators. We included all studies for adverse events and withdrawals.

'Summary of findings' table

We used the GRADEpro software (GRADEpro GDT 2015) to prepare the 'Summary of findings' table for major outcomes for flexibility exercise training versus land‐based aerobic training. In the 'Summary of findings' table, we integrated analysis of the certainty of the evidence and magnitude of effect of the interventions. We used the five GRADE considerations (study limitations, consistency of effect, imprecision, indirectness, and publication bias) to assess the certainty of the body of evidence at one of four levels, as follows.

High certainty: further research is very unlikely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Moderate certainty: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low certainty: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low certainty: research shows substantial uncertainty about the estimate.

We downgraded the overall rating of the certainty of the evidence for the study (outcome by outcome) by at least one grade (using GRADE considerations) if the study had high or unclear risk of bias in a least one domain. We assigned GRADE certainty ratings separately for the seven major outcomes. Because of the comprehensive nature of the outcome variable of HRQoL, we gave it primacy over all other variables in the 'Summary of findings' table and the Plain language summary.

Subgroup analysis and investigation of heterogeneity

There were insufficient studies to conduct subgroup analysis as indicated in the review protocol (Busch 2015).

We assessed statistical heterogeneity among the trials using the heterogeneity statistics (Chi² test and I² statistic). We considered P values < 0.01 or I² > 50% to be indicative of significant heterogeneity. In the case of P value < 0.01 or I² > 50% (or both), we used a random‐effects model instead of the fixed‐effect model for meta‐analysis. In addition, in the case of statistical heterogeneity, we scrutinized the studies for sources of clinical heterogeneity and methodological differences.

Sensitivity analysis

We planned to perform sensitivity analyses to investigate the impact of statistical heterogeneity and methodological weakness (i.e. high or unclear risk of selection bias and detection bias, or attrition rates > 20%).

Results

Description of studies

See Characteristics of included studies; Characteristics of excluded studies; and Characteristics of studies awaiting classification

Results of the search

Our searches identified total of 6530 records. After removal of 2771 duplicates, 3759 records remained. We excluded 3478 records based on citation and abstract screening. We assessed 255 full‐text articles, 1 thesis, and 25 trial registry records for eligibility. We excluded 96 full‐text articles and 1 trial registry record. After assessing full‐text physical activity articles for the type of intervention, we excluded 140 articles, 5 published study protocols, and 22 trial registry records because the intervention type was not flexibility. We included 14 full‐text publications (12 primary studies and 2 companion papers), 1 thesis, and 2 trial registry records. For details see Figure 1.

Figure 1

Study flow diagram.

Included studies

Fourteen full‐text reports, 2 registry records, and 1 thesis describing 12 unique flexibility exercise training studies met our selection criteria and were considered in this review (Altan 2009; Amanollahi 2013; Assumpção 2017; Bressan 2008; Calandre 2009; Gavi 2014; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003). We used the two registry records (hereafter described as 'RCT protocols') to assess the certainty of studies.

The included studies were published between 1988 and 2017 and were conducted in seven different countries, as follows: Canada (1 study), the United States (1 study), Turkey (1 study), Brazil (4 studies), Iran (1 study), Spain (3 studies), and the United Kingdom (1 study). Nine of the 12 studies were published in English, with the remaining published in Spanish (López‐Rodríguez 2012), Farsi (Amanollahi 2013), and Portuguese (Matsutani 2012). We contacted the authors of seven studies to request additional information needed to assess risk of bias, exercise intervention, and/or treatment effect (Altan 2009; Amanollahi 2013; Assumpção 2017; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Richards 2002). We received responses from the authors of six studies (Altan 2009; Assumpção 2017; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Richards 2002), and no response from the author of Amanollahi 2013. The outcomes extracted for all included studies are presented in Table 2.

Table 2. Outcome measures used for analysis in the included studies

Outcome	Name of instrument or index/subscale
Health‐related quality of life	FIQ Total¹ (0 to 100)
Pain intensity	Current pain (VAS), FIQ pain¹ (VAS), SF‐36 bodily pain
Fatigue	FIQ fatigue¹ (0 to 100), SF‐36 Vitality (0 to 100)
Stiffness	FIQ stiffness¹ (0 to 100)
Physical function	FIQ physical function¹ (0 to 100), SF‐36
Depression	Beck Depression Inventory (0 to 63), FIQ depression¹ (0 to 100)
Tenderness	Tender point count (0 to 18), total myalgic score
Adverse events	Not a standardized instrument or index/narrative information

FIQ: Fibromyalgia Impact Questionnaire; SF‐36: 36‐item Short Form Health Survey; VAS: visual analogue scale

¹The revised FIQ scale, Bennett 2009, and any language‐translated version of the FIQ (Portuguese version; Assumpção 2017) were considered to be equivalent to the original version of the FIQ (Burckhardt 1991).

Two studies had more than two study arms (Amanollahi 2013; Assumpção 2017). For details see the Characteristics of included studies section.

Participants

This review included 743 participants. Seven studies included only female participants (n = 448); one study included both male and female participants (Calandre 2009); and four studies did not specify the gender of participants (López‐Rodríguez 2012; Matsutani 2012; McCain 1988; Richards 2002). The average duration of disease or symptoms since diagnosis ranged from 3 to 10 years. Nine studies did not report this information (Altan 2009; Amanollahi 2013; Assumpção 2017; Bressan 2008; Gavi 2014; López‐Rodríguez 2012; Matsutani 2012; McCain 1988; Valim 2003). Based on 11 studies that provided mean ages and ranges, the average age of participants was 48.6, ranging from 35.8 to 56 (Richards 2002 did not provide mean ages, only median).

Fibromyalgia diagnosis was based on ACR 1990 criteria, Wolfe 1990, in all but one study (McCain 1988), where participants had to fulfill the diagnostic criteria of Smythe 1979.

The inclusion criteria for the studies included: age; diagnosis of fibromyalgia; willingness to keep their pharmacological treatment constant during the study period and not start new exercise or alternative therapies; being a patient of the study’s health center; able to understand the procedures and follow the basic orientation given; pass the treadmill stress test (used to determine the effects of exercise on the heart; electrical activity of the heart is monitored during the test); provide consent; being sedentary women; never previously treated for fibromyalgia; newly diagnosed with fibromyalgia.

The exclusion criteria for the studies included: presence of an accompanying rheumatoid disease; unstable hypertension; severe cardiopulmonary problems; psychiatric disorders affecting participant compliance; infection; fever; severe physical impairment; inflammatory disease, uncontrolled endocrine diseases; allergic diseases (including allergy to chlorine); pregnancy; malignancy; inadequate cognitive level to understand the orientations and procedures; those who had never attended a swimming pool; had any disease susceptible to worsening with warm‐water exercise; respiratory, metabolic, and rheumatic disease that could limit exercise; disease associated with autonomic dysfunction (e.g. arterial hypertension, diabetes); use of medications such as moderate or high dose of beta blockers, calcium channel blockers, antihypertensive, anticonvulsant, non‐tricyclic antidepressants, and opioid analgesics; exercise within the last three months or current participation in a regular exercise program; inability to understand questionnaires; positive treadmill test (e.g. abnormal heart activity detected during exercise on the treadmill or myocardial ischemia detected); receipt of social security benefits; neurological or renal disease that would preclude involvement in an exercise program; current cigarette smoking; score ≥ 29 on the Beck Depression Inventory modified for fibromyalgia; missing 14 or more sessions or change in pharmacological treatment during the study; history or suspicion of neoplasia; amitriptyline within previous three months; ischemic heart disease; symptomatic cardiac arrhythmias; exercise‐induced asthma; individuals for whom an alternative medical diagnosis could explain current symptoms; inability to attend classes; inability to co‐operate; body mass index > 35; hyperthyroidism.

Interventions

Descriptions of trial interventions, including congruence with the ACSM criteria for flexibility in healthy adults (ACSM 2013), are detailed in the Characteristics of included studies section and in Table 1, Table 3, and Table 4 .

Table 3. Detailed description of exercise protocol

Study	Group (naming of the intervention as described by author)	Flexibility	Aerobic	Strength	Other
Altan 2009	Length: 24 weeks 1. HOME EXERCISE 1 h, 3/week RELAXATION/STRETCHING 1 h, 3/week 2. PILATES 1 h, 3/week	1. Muscle groups/exercises: stretching of cervical, shoulder, thoracic, lumbar, gluteal, leg and cruris muscle groups. Holding each stretch for 6 s and relaxed for 4 s 2. None	1. None 2. None	1. None 2. The protocol comprised 9 modules covering postural education, search for neutral position, sitting exercise, antalgic exercises, and breathing education. Equipment: resistance bands and 26‐centimeter Pilates balls were used as supportive equipment. The following components were included in the exercises: resistance and stabilization, flexibility and range of motion, proper body alignment, balance, co‐ordination, and body awareness. 1‐hour program (5 min breathing, 10 min warm‐up, 35 min conditioning, 10 min cool‐down)	1. None 2. None
Amanollahi 2013	Length: 4 weeks 1. FLEXIBILITY 3/week 2. MEDICATION 3/day and 1/day 3. FRICTION MASSAGE 3/week	1. Static and non‐weight‐bearing stretching of shoulders blade musculature, paraspinal muscles, neck and low back muscle, hamstrings and calf muscles. 3 reps with 30 s holds 2. None 3. None	1. None 2. None 3. None	1. None 2. None 3. None	1. None 2. 400 mg ibuprofen (Aria Pharmaceutical Co., Iran) 3 x/day and 25 mg nortriptyline (Darou Pakhsh Pharmaceutical Mfg. Co, Iran) 1/day 3. 3 30‐second friction massages using the second and third fingers with a pressure of approximately 0.5 to 1 kg/point on the painful spot so that a mild pallor occurred on the practitioner’s nails
Assumpção 2017	Length: 12 weeks 1. FLEXIBILITY 2/week 2. RESISTANCE 2/week 3. CONTROL	1. Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis). In early stages 3 reps, from fifth week 4 reps, from ninth week 5 reps; intensity of stretch was gradually increased to point of moderate discomfort and held for 30 s holds for 40 min. 2. None 3. None	1. None 2. None 3. None	1. None 2. Dumbells for upper limbs and shin pads for lower limbs; exercises targeted triceps surae, quadriceps, hip adductors and abductors, hip flexors, elbow flexors and extensors, pectoralis major, and rhomboids. Duration of 40 min (5 min breathing, 10 min warm‐up, 35 min conditioning, 10 min cool‐down); first 2 sessions there was no load; 0.5 kg was added each week if participant identified the effort as slightly intense on the Borg Scale (score = 13); 8 reps 3. None	1. None 2. None 3. None
Bressan 2008	Length: 8 weeks 1. STRETCHING 1/week 2. PHYSICAL CONDITIONING EXERCISES 1/week	1. Static stretches of triceps surae, ischiotibial, gluteal, paravertebral, latissimocondyloideus, pectoral, trapezius, and respiratory muscles. In addition, stretching at home was recommended. Exercises were performed in a series of 5 repetitions, with 30 s holds for 40 to 45 min. 2. None	1. None 2. Walking for a period of 30 min using a motorized treadmill (5 min warm‐up, 25 min walking, 5 min rest). The walking speed was determined at 60% to 75% of the maximum HR, deducting participant's age from 220.	1. None 2. None	1. None 2. None
Calandre 2009	Length: 6 weeks 1. STRETCHING (in water) 1 h, 3/week 2. TAI CHI (in water) 1 h, 3/week	1. Training was done in a pool with water heated at 36 °C and was preceded by a shower with warm water (34.5 °C to 35.5 °C). In order to facilitate the stretching, participants were given 1‐meter‐long wooden sticks. Stretching was performed over the muscles of main body areas: cervical area, upper and lower extremities, and trunk. 2. None	1. None 2. None	1. None 2. None	1. None 2. Participants were taught the 16 movements which constitute the Tai Chi therapy without the assistance of additional material. Tai Chi is performed standing in shoulder‐depth water using a combination of deep breathing and slow, broad movements of the arms, legs, and torso.
Gavi 2014	Length: 16 weeks 1. FLEXIBILITY 45 min, 2/week 2. RESISTANCE TRAINING 45 min, 2/week	1. Stretching program included the major muscles groups. Valim 2003 is referenced for stretching program. 2. None	1. None 2. Resistance training group received supervised progressive training in the standing and sitting positions using weight machines. The intensity was moderate, with an overload of 45% of the estimated 1 RM, calculated based on maximal repetitions. 8 major groups were trained (quadriceps, femoris, hamstrings, biceps brachii, triceps brachii, pectoral, calf, deltoid, and latissimus dorsi) in 12 different exercises, with 3 sets of 12 repetitions (leg press, leg extension, hip flexion, pectoral fly, triceps extension, shoulder flexion, leg curl, calf, pulldown, shoulder abduction, biceps flexion, and shoulder extension).	1. None 2. None	1. None 2. None
Jones 2002	Length: 12 weeks 1. FLEXIBILITY 1 h, 2/week 2. STRENGTH 1 h, 2/week	1. The muscles included in the protocol were gastrocnemius, tibialis anterior, quadriceps, hamstrings, gluteus, abdominals, erector spinae, pectorals, latissimus dorsi, rhomboids, deltoids, biceps, triceps. Static stretch, participant controlled intensity of stretches. 10 min warm‐up, 40 min stretching, 10 min cool‐down of guided imagery and relaxation 2. Warm‐up and cool‐down	1 and 2 warm‐up	1. None 2. The muscles included in the protocol were gastrocnemius, tibialis anterior, quadriceps, hamstrings, gluteus, abdominals, erector spinae, pectorals, latissimus dorsi and rhomboids, deltoids, biceps, triceps. Equipment used: 1‐ to 3‐pound weights and/or surgical tubing. Concentric/eccentric contractions with minimized work during eccentric phase. Intensity and progression directed by participant. Single set throughout, repetitions progressed from 4 or 5 to 12. Participants encouraged to decrease activity during fibromyalgia flares. 1‐hour program including 5 min warm‐up, 45 min strengthening, 10 min cool‐down	1. None 2. None
López‐Rodríguez 2012	Length: 12 weeks 1. (CONTROL) FLEXIBILITY 1 h, 2/week 2. EXPERIMENTAL GROUP biodanza 1 h, 2/week	1. Flexibility stretching exercises that included global stretches and stretches specific to different muscular areas of the body 2. None	1. None 2. Biodanza in the water with water temperature approximately of 29 °C preceded by a shower at 33 °C to 35 °C, biodanza‐type movements like walking, slow movements of upper and lower extremities, cool‐down stretching. The duration of the intervention was 60 min (10 min warm‐up, 4 min biodanza, 10 min cool‐down).	1. None 2. None	1. None 2. None
Matsutani 2012	Length: 8 weeks 1. STRETCHING 45 min, 1/week 2. AEROBIC 30 min, daily	1. Static stretching exercises were performed in a segment of the muscle groups: triceps leg, gluteal, iliopsoas, hamstring, paraspinal, latissimus dorsi, diaphragm, adductor pubic associated with lumbar pelvic movements, trapezius, and major and minor pectoralis. All exercises emphasized breathing and postural alignment. Static stretches held 30 s, repeated 4 times with 30 s rest, progressed from lying to sitting to standing upright or in flexion. Breathing and postural alignment were emphasized. A mirror was used as an aid to the perception of movements of the upper limbs and postural alignment. 2. None	1. None 2. A treadmill walk was performed with intensity defined according to HR, between 60% and 70% HR for age (formula used, HR max = 220−age).	1. None 2. None	1. None 2. None
McCain 1988	Length: 20 weeks 1. FLEXIBILITY 1 h, 3/week 2. AEROBIC EXERCISE 1 h, 3/week	1. Exercises consisted of flexibility maneuvers such that sustained HR responses greater than 115 beats per min were not attained. 2. None	1. None 2. After a 10‐minute preliminary warm‐up exercise, individuals were subjected to sustained HR elevation training through the use of a bicycle ergometer. Heart rate was maintained in excess of 150 beats per minute for gradually increasing time periods.	1. None 2. None	1. None 2. None
Richards 2002	Length: 12 weeks 1. RELAXATION AND FLEXIBILITY 1 h, 2/week 2. AEROBIC EXERCISE 1 h, 2/week	1. Relaxation and flexibility comprised upper and lower limb stretches and relaxation techniques based on the published regimen by Ost 1987. As the classes proceeded, more techniques were introduced progressing through progressive muscle relaxation, release‐only relaxation and visualization, cue‐controlled relaxation, and differential relaxation. 2. None	1. None 2. Exercise therapy comprised an individualized aerobic exercise program, mostly walking on treadmills and cycling on exercise bicycles. Each individual was encouraged to steadily increase the amount of exercise as tolerated.	1. None 2. None	1. None 2. None
Valim 2003	Length: 20 weeks 1. STRETCHING EXERCISE GROUP 45 min, 3/week 2. AEROBIC EXERCISE GROUP 45 min, 3/week	1. 17 static exercises using both muscles and joints in a general way, including face, cervical , trunk, and extremities. Exercises chosen to provide flexibility without increasing HR. Each maximum position was sustained for 30 s. 2. None	1. None 2. Exercise group underwent a walking program monitored with frequency meters and supervised by a physiotherapist. The walking speed (training load) was determined by the training HR. Training HR defined as the load beat immediately preceding the one in which the anaerobic threshold occurred. Each training session was preceded by a warm‐up period in which participants were instructed to walk freely and slowly for 5 to 10 min. After each session the participants were placed in a circle and performed rhythmic movements, to promote cooling off, for 5 min.	1. None 2. None	1. None 2. None

HR: heart rate; RM: maximum repetition; Max: maximum

See: Summary of findings for the main comparison Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia

Table 4. Congruence with 2013 ACSM flexibility criteria for healthy adults

Author, year	Met ACSM 2013 criteria
	Frequency	Intensity	Time	Type	Volume	Pattern
	2 to 3 d/week with daily being most effective	Stretch to the point of feeling tightness or slight discomfort	10 s to 30 s	A series of flexibility exercises for each of the major muscle‐tendon units	60 s of total stretching time for each flexibility exercise	2 to 4 repetitions
Altan 2009	Yes	Unclear	No	Yes	Unclear	Unclear
Amanollahi 2013	Yes	Unclear	Yes	Yes	Yes	Yes
Assumpção 2017	Yes	Yes	Yes	Yes	Unclear	Unclear
Bressan 2008	No	Unclear	Yes	Yes	Yes	Yes
Calandre 2009	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Gavi 2014	Yes	Unclear	Yes	Yes	Unclear	Unclear
Jones 2002	Yes	Unclear	Yes	Yes	Unclear	Unclear
López‐Rodríguez 2012	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Matsutani 2012	No	Unclear	Yes	Yes	Yes	Yes
McCain 1988	Yes	Unclear	Unclear	Unclear	Unclear	Unclear
Richards 2002	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Valim 2003	Yes	Unclear	Yes	Yes	Unclear	Unclear

Flexibility versus untreated controls (1 study). There was only one study in this comparison (Assumpção 2017). Exercise frequency was two times a week. The duration of the intervention was 12 weeks. The intensity was described as "stretch was gradually increased to point of moderate discomfort." The duration of each stretch was 30 seconds. Static stretches were used and targeted large muscles of upper and lower body. The flexibility intervention was 40 minutes in total. Volume (estimated from duration of stretch and repetitions that gradually increased from three to five through the intervention) ranged from 90 seconds to 2.5 minutes. The program was supervised.
Flexibility versus aerobic training (5 studies). Exercise frequency ranged from one to three times a week: one time per week in Bressan 2008 and Matsutani 2012; two times per week in Richards 2002; and three times per week in McCain 1988 and Valim 2003. Duration varied from eight weeks, in Bressan 2008 and Matsutani 2012, to 20 weeks, in McCain 1988 and Valim 2003. None of the studies specified the intensity of the stretching exercises, therefore we were unable to determine if the stretches were taken to the intensity recommended by ACSM 2013, i.e. the point of feeling tightness or slight discomfort (ACSM 2013). The flexibility intervention time ranged from 40 to 60 minutes, with the average duration for each stretch 30 seconds. Two studies did not provide information on the duration of each stretch (McCain 1988; Richards 2002). Studies used static stretches for the major muscle‐tendon units of both the upper and lower limbs, however in one study, McCain 1988, it was difficult to judge the type of stretches and body region (e.g. “Exercise consisted of flexibility maneuvers such that sustained heart rate responses greater than 115 beats per minute were not attained”). None of the studies outlined volume of stretches (i.e. total stretching time for each flexibility exercise), but we could calculate the volume from the duration of stretch and repetitions in two studies (Bressan 2008; Matsutani 2012). Only two studies provided information on the number of repetitions for each stretching exercise, which ranged from four, in Bressan 2008, to five, in Matsutani 2012. Sessions were supervised in two studies (McCain 1988; Richards 2002). It was unclear if the stretching intervention was supervised in Bressan 2008, Matsutani 2012, and McCain 1988.
Flexibility versus resistance exercise training (3 studies). Exercise frequency was two times a week for all three studies (Assumpção 2017; Gavi 2014; Jones 2002). The duration of the intervention ranged from 12 to 20 weeks. Intensity of the intervention was specified in only one study (Assumpção 2017), which stated that “the stretch intensity was increased gradually to the point of moderate discomfort.” The flexibility intervention ranged from 40 to 60 minutes. The duration of each stretch was 30 seconds in Assumpção 2017 and Gavi 2014. The duration of each stretch was 60 seconds in Jones 2002. All studies used static stretches of major muscle‐tendon units of both the upper and lower limbs. We could estimate the volume of stretches in one study (Assumpção 2017), as detailed above in the first bullet. The other studies did not provide volume of stretches and number of repetitions for each stretch. Flexibility exercise interventions were supervised in two studies (Assumpção 2017; Jones 2002). It was unclear if sessions were supervised in Gavi 2014.
Flexibility versus other comparators (4 studies, 1 with 3 parallel arms). The frequency was three times per week in all but one study (López‐Rodríguez 2012), where the frequency was two times per week. The duration of the intervention ranged from 4 to 12 weeks. None of the studies specified the intensity of the intervention. The mean length of the flexibility intervention was 60 minutes. One study did not specify the length of the flexibility intervention (Amanollahi 2013). The length of each stretch ranged from 6 to 30 seconds. Two studies did not specify the length of each stretch (Calandre 2009; López‐Rodríguez 2012). For one study, we could calculate the volume of each stretch from the number of repetitions and length of each stretch: for example, the volume of each stretch was 90 seconds based on the 3 repetitions and 30 seconds hold per stretch in Amanollahi 2013. We could not calculate the volume in the remaining studies, as the number of repetitions or the length of each stretch (or both) was not provided.

Excluded studies

We excluded 3478 records that did not meet our inclusion criteria based on title and abstract screening (Figure 1). We examined 255 full‐text articles, 1 thesis, and 25 trial registry records for possible inclusion in the review. We excluded full‐text articles (n = 96) and trial registry records (n = 1) due to unmet criteria as follows: study design (n = 56); intervention (n = 23); diagnosis (n = 6); between‐group data (n = 3); implementation of randomization (n = 5); isolation of data for fibromyalgia (n = 2); lack of designated outcomes (n = 2). The remaining 159 full‐text articles (5 of which were published study protocols), 1 thesis, and 24 trial registry records represented RCTs examining effects of physical activity interventions for fibromyalgia. Of these, we ruled out 167 because the physical intervention did not have any flexibility‐only intervention or the study was reviewed, or was designated to be reviewed, in another Cochrane Review in this series.

Risk of bias in included studies

'Risk of bias' assessments for the 12 included studies are provided in the 'Risk of bias' table in the Characteristics of included studies section and in Figure 2 and Figure 3. 'Risk of bias' assessments were based on primary article, protocol when available, and data supplemented by study author responses.

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Allocation

Seven of the 12 included studies used an acceptable method of random sequence generation (computer‐generated sequence, coin toss, drawing of cards or lots), and were therefore rated as at low risk of bias (Altan 2009; Assumpção 2017; Calandre 2009; Jones 2002; López‐Rodríguez 2012; McCain 1988; Richards 2002). In two studies the allocation methods used were unclear (Amanollahi 2013; Bressan 2008). Three studies used unacceptable methods for random sequence generation and were therefore judged to be at high risk of bias (Gavi 2014; Matsutani 2012; Valim 2003). For allocation concealment, we rated one study as at low risk of bias, Assumpção 2017, and five studies as at unclear risk of bias as the information provided was insufficient to permit a definitive judgement. We rated the remaining six studies as at high risk of bias, as allocation was not concealed (i.e. open‐label design), or unacceptable methods of allocation concealment (e.g. alternating allocation based on sequence of enrollment or use of a random number list) was employed (Calandre 2009; Gavi 2014; López‐Rodríguez 2012; Matsutani 2012; McCain 1988; Valim 2003). Overall, we rated the risk of allocation bias as high (˜50%; Figure 2).

Blinding

In exercise studies, blinding of participants and care providers from treatment allocation is rare.

Performance bias

We rated blinding of participants and personnel (performance bias) as low risk for three studies (Calandre 2009; McCain 1988; Richards 2002); unclear risk for two studies (Amanollahi 2013; Bressan 2008); and high risk for seven studies (Altan 2009; Assumpção 2017; Gavi 2014; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Valim 2003). Overall, we rated risk of performance bias as high (˜55%; Figure 2).

Detection bias

For detection bias, we assessed subjective and objective outcomes separately. Not all trials used a combination of both kinds of outcomes. While completing the 'Risk of bias' tool, we were unable to insert 'not applicable' or to leave the section blank (indicating that the outcome was not measured), thus in such cases we specified 'low risk' and inserted the comment 'not applicable: objective outcomes were not assessed.'

For self‐reported outcomes (subjective), we rated all nine studies as at high risk of bias (Altan 2009; Amanollahi 2013; Assumpção 2017; Bressan 2008; Gavi 2014; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Valim 2003). For objectively reported outcomes, four studies blinded outcome assessors to participant group assignment and were therefore rated as at low risk of bias (Altan 2009; Jones 2002; Richards 2002; Valim 2003). We rated six additional studies as at low risk for this domain, however these should actually be rated as 'not applicable' because either the data were not usable or no objective outcomes were measured (Amanollahi 2013; Assumpção 2017; Bressan 2008; Gavi 2014; López‐Rodríguez 2012; McCain 1988). Two studies did not blind assessors and were rated as high risk (Calandre 2009; Matsutani 2012). Overall, we rated risk of detection bias as high (100%; Figure 2).

Incomplete outcome data

Nine studies reported complete outcome data. Bressan 2008 had no missing outcome data. Calandre 2009 analyzed data using the intention‐to‐treat (ITT) principle. Missing outcome data were balanced in numbers across intervention groups, and reasons for missing outcome data were unlikely to be related to true outcomes in Altan 2009, Amanollahi 2013, Gavi 2014, Jones 2002, McCain 1988, and Valim 2003. Richards 2002 replaced missing outcome data with last known value or baseline value. Assumpção 2017 did not use ITT, yet had one participant from the resistance group drop out due to increased pain, thus risk of bias was unclear. López‐Rodríguez 2012 and Matsutani 2012 provided incomplete outcome data, therefore we rated these studies as high risk. Overall, we rated risk of attrition bias as low (˜75%; Figure 2).

Selective reporting

Registered protocols were available for four of the included studies (Assumpção 2017; clinicaltrials.gov/ct2/show/NCT01029041; Calandre 2009, clinicaltrials.gov/ct2/show/NCT00550641; Gavi 2014, clinicaltrials.gov/ct2/show/NCT02004405; López‐Rodríguez 2012, clinicaltrials.gov/ct2/show/NCT03182556). We rated three studies as having a low risk of reporting bias. One of these three studies had a trial protocol available (Assumpção 2017). Although the remaining two studies rated as at low risk of reporting bias did not have a registered trial protocol, it appeared that published reports included all expected outcomes (McCain 1988; Richards 2002). We rated one study as having a high risk of reporting bias (Calandre 2009). Calandre 2009 had some incongruence between the outcome descriptions and the results reported in the publication. For example, there was no information on tender points as an outcome, yet this was presented in the results. We rated eight out of the 12 included studies as having an unclear risk of bias for this domain (Altan 2009; Amanollahi 2013; Bressan 2008; Gavi 2014; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; Valim 2003). Overall, we rated risk of reporting bias as unclear or high (˜75%; Figure 2).

Other potential sources of bias

We rated risk of other potential sources of bias as low (˜65%; Figure 2). We assessed eight studies as at low risk of bias for this domain (Altan 2009; Assumpção 2017; Gavi 2014; Jones 2002; López‐Rodríguez 2012; Matsutani 2012; McCain 1988; Richards 2002). We rated two studies as at unclear risk of bias due to insufficient information to judge whether an important risk of bias existed (Amanollahi 2013; Valim 2003). We assessed two studies as at high risk of other potential sources of bias. Bressan 2008 had a substantial lack of methodological information to demonstrate rigor in the study design used (e.g. blinding on several levels, allocation, randomization, the instructor/instructors used for the intervention, the level of supervision). Calandre 2009 had baseline imbalances that likely impacted the results.

Effects of interventions

See summary of findings Table for the main comparison for the main comparison of flexibility exercise training compared with land‐based aerobic exercise training. For comparisons of flexibility exercise training versus untreated controls, resistance exercise training, other comparators and long‐term effects of flexibility exercise training and aerobic training, see certainty of evidence in Table 5, Table 6, Table 7, and Table 8.

Table 5. Quality of evidence—GRADE assessment: long‐term effects of flexibility exercise training versus aerobic exercise training

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Aerobic (end of intervention)	Certainty	Importance
HRQoL (follow‐up 36 weeks after end of intervention; assessed with FIQ Total 0 to 100, lower is best)
1 randomized trial	Serious^a	Not serious	Very serious^b	Serious^c	None	67	68	⨁◯◯◯ VERY LOW	CRITICAL
Pain intensity (follow‐up 36 weeks after end of intervention; assessed with VAS 0 to 100, lower is best)
1 randomized trial	Serious^a	Not serious	Very serious^b	Serious^c	None	67	69	⨁◯◯◯ VERY LOW	CRITICAL
Fatigue, stiffness, and physical function: not measured
Withdrawals, adverse events: not reported

FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life; VAS: visual analogue scale
^aDowngraded one level for selection bias.
^bDowngraded two levels because flexibility was used as a proxy (i.e. flexibility exercise was used along with relaxation as the control in the study).
^cDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb).

Table 6. Quality of evidence—GRADE assessment: flexibility intervention versus control

№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	№ of participants		Certainty	Importance
						Flexibility	Control (end of intervention)
Pain, intensity, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	14	14	⨁⨁◯◯ LOW	CRITICAL
Physical function, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	14	14	⨁⨁◯◯ LOW	CRITICAL
Withdrawals
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	4/18 (22.2%)	2/16 (12.5%)	⨁⨁◯◯ LOW	IMPORTANT
HRQoL, fatigue, and stiffness: data were described as skewed, thus were not used
Adverse events: not measured/reported for either group

HRQoL: health‐related quality of life
^aDowngraded one level because of selection and performance bias.
^bDowngraded one level because of imprecision (sample size lower than 400 rule‐of‐thumb).

Table 7. Quality of evidence—GRADE assessment: flexibility intervention versus resistance training intervention

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Resistance (at end of intervention)	Certainty	Importance
HRQoL, FIQ Total, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	28	28	⨁⨁◯◯ LOW	CRITICAL
Pain, intensity, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Not serious	Not serious	Serious^b	None	73	79	⨁⨁◯◯ LOW	CRITICAL
Fatigue, 0 to 100, lower is best (end of intervention)
2 randomized trials	Very serious^c	Serious^d	Not serious	Serious^b	None	59	63	⨁◯◯◯ VERY LOW	IMPORTANT
Physical function, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Very serious^e	Not serious	Serious^b	None	45	51	⨁◯◯◯ VERY LOW	IMPORTANT
> 30% improvement of pain (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	5/14 (35.7%)	6/16 (37.5%)	⨁⨁◯◯ LOW	IMPORTANT
Withdrawals
3 randomized trials	Serious^a	Not serious	Not serious	Serious^b	None	19/77 (24.7%)	14/82 (17.1%)	⨁⨁◯◯ LOW	IMPORTANT
Stiffness: not measured
Adverse events: not measured/reported for flexibility training group For resistance training group, "one subject in the resistance group interrupted participation in the study because of worsening pain" (page 13 of 22)

FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life
^aDowngraded one level because of selection and performance bias.
^bDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb).
^cDowngraded two levels because of selection and performance bias.
^dDowngraded one level for inconsistency.
^eConsiderable heterogeneity (I² = 91%).

Table 8. Quality of evidence—GRADE assessment: flexibility intervention versus other comparators

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Other comparators (end of intervention)	Certainty	Importance
HRQoL, FIQ Total, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	83	86	⨁◯◯◯ VERY LOW	CRITICAL
Pain, intensity, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	153	151	⨁◯◯◯ VERY LOW	CRITICAL
Fatigue, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	59	61	⨁◯◯◯ VERY LOW	IMPORTANT
Stiffness, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	59	61	⨁◯◯◯ VERY LOW	IMPORTANT
Physical function, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^c	1 study	20	19	⨁⨁◯◯ LOW	IMPORTANT
Withdrawals
4 randomized trials	Serious^a	Serious	Not serious	Serious^c		27/188 (14.4%)	26/192 (13.5%)	⨁◯◯◯ VERY LOW	IMPORTANT
Adverse events: not reported for flexibility group In the medication arm, 5 participants who received ibuprofen and 1 participant who received nortriptyline experienced side effect (from translated version of article).

FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life
^aDowngraded one level because of selection and performance bias.
^bInterventions not consistent across studies.
^cDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb).

Flexibility exercise training versus land‐based aerobic exercise training at the end of the intervention

Major outcomes

Two studies provided data for HRQoL (Richards 2002; Valim 2003), five studies for pain intensity (Bressan 2008; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003), three studies for fatigue (Bressan 2008; Richards 2002; Valim 2003), one study for physical function (Valim 2003), and one study for the major outcome of stiffness (Bressan 2008). No studies provided clear data for adverse events, and five studies provided data for all‐cause withdrawals (Bressan 2008; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003).

Health‐related quality of life (self‐reported, FIQ Total, lower scores mean better health, negative numbers mean improvement): Two studies provided data for the major outcome HRQoL (Richards 2002; Valim 2003). Assessment of statistical heterogeneity among trials indicated I² = 74% (i.e. 50% to 90%: substantial heterogeneity). We evaluated heterogeneity across outcomes for these studies, and since we did not find a large degree of heterogeneity between these studies in other measures we decided to include both studies for the meta‐analysis. Mean HRQoL was 46 mm and 42 mm in the flexibility and aerobic groups, respectively. The analysis showed no evidence of a clinically important effect for flexibility exercise training compared with aerobic training postintervention (N = 193; mean difference (MD) 4.14, 95% confidence interval (CI) −5.77 to 14.05; Analysis 1.1). Absolute change was 4% worse (6% better to 14% worse). Relative change in the flexibility groups compared to the aerobic groups was 7.5% worse (10.5% better to 25.5% worse).

Pain intensity (self‐reported, 0‐to‐100 VAS, lower scores mean less pain, negative numbers mean improvement): Data on pain intensity were available for five studies (Bressan 2008; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003). Mean pain was 57 mm and 52 mm in the flexibility and aerobic groups, respectively. The meta‐analysis showed no evidence of a clinically important effect with flexibility exercise training compared with aerobic exercise training postintervention (N = 266; MD 4.72, 95% CI −1.39 to 10.83; Analysis 1.2). Absolute change was 5% worse (1% better to 11% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.7% worse (2% better to 15.4% worse). Heterogeneity analysis demonstrated no evidence of heterogeneity (Chi² = 0.55, P = 0.55 with df = 2; I² = 0%).

Fatigue (self‐reported, 0‐to‐100 scale, lower scores mean less fatigue, negative numbers mean improvement): Three trials assessed fatigue as an outcome (Bressan 2008; Richards 2002; Valim 2003). We did not include data on fatigue provided by Richards 2002 in the meta‐analysis as the Chalders fatigue scale was not one of our accepted outcome measures. Mean fatigue was 67 mm and 71 mm in the aerobic and flexibility groups, respectively. The meta‐analysis presented no evidence of a clinically important improvement with flexibility exercise training compared to aerobic exercise training postintervention (N = 75; MD −4.12, 95% CI −13.31 to 5.06; Analysis 1.3). Absolute change was 4% better (13% better to 5% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.0% better (19.4% better to 7.4% worse).

Stiffness (self‐reported, 0‐to‐100 FIQ, lower scores mean less stiffness, negative numbers mean improvement): Only one study provided data on stiffness (Bressan 2008). Although the analysis showed a clinically important improvement with flexibility exercise compared with aerobic exercise postintervention (N = 15; MD −29.6, 95% CI −51.47 to −7.73; Analysis 1.4), the 95% confidence interval included both a clinically important and unimportant change. Mean stiffness was 49 mm to 79 mm in the flexibility and aerobic groups, respectively. Absolute change was 30% better (8% better to 51% better). Relative change in the flexibility group compared to the aerobic group was 39% better (10% better to 68% better).

Physical function (self‐reported, 0‐to‐100 FIQ, lower scores means fewer limitations, negative numbers mean improvement): Two studies assessed physical function as an outcome (Bressan 2008; Valim 2003). Data on physical function provided by Bressan 2008 were not presented on a 100‐point scale. In addition, there was insufficient information as to how Bressan 2008 reported their data for this particular measure. Consequently, their data were not used for meta‐analysis or reported here. Data from Valim 2003 showed no evidence of a clinically important improvement with flexibility exercise compared to aerobic exercise postintervention (N = 60; MD 6.04, 95% CI −3.95 to 16.03; Analysis 1.5). Mean physical function was 23 points and 17 points in the flexibility and aerobic groups, respectively. Absolute change was 6% worse (4% better to 16% worse). Relative change in the flexibility groups compared to the aerobic groups was 14% worse (9.1% better to 37.1% worse).

Adverse events: One adverse effect was described among the 132 participants allocated to flexibility training. The study reported "a patient in the flexibility group had tendinitis of the Achilles tendon, which responded to treatment with local heat and a reduction in exercise for 14 days” (McCain 1988). However, it is unclear whether the tendinitis was related to participation in the intervention.

All‐cause withdrawal: Rates for flexibility exercise training groups (n1/N1) versus aerobic exercise training groups (n2/N2) were 0/8 versus 0/7 (Bressan 2008) (not included in the analysis); 5/17 versus 8/15 (Matsutani 2012); 2/22 versus 2/20 (McCain 1988); 12/67 versus 12/69 (Richards 2002); and 10/38 versus 6/38 (Valim 2003). We found no evidence of an effect on all‐cause withdrawal between the flexibility exercise training and aerobic exercise training groups (risk ratio (RR) 0.97, 95% CI 0.61 to 1.55; Analysis 1.8). Absolute change was 1% fewer withdrawals in the flexibility groups (8% fewer to 21% more). Relative change in the flexibility groups compared to the aerobic groups was 3% fewer (39% fewer to 55% more).

Minor outcomes

Three studies evaluated the effect of flexibility exercise training on the minor outcome of depression (Bressan 2008; Matsutani 2012; Valim 2003), and four studies on tenderness (Matsutani 2012; McCain 1988; Richards 2002; Valim 2003). No studies reported data on improvement in pain greater than 30%.

Depression (self‐reported, 0‐to‐100 FIQ, lower scores mean less depression, negative numbers mean improvement): Data on depression were available for three studies (Bressan 2008; Matsutani 2012; Valim 2003). Assessment of statistical heterogeneity among trials indicated I² = 63% (i.e. 50% to 90%: substantial heterogeneity). We investigated the source of this heterogeneity by comparing this meta‐analysis to other outcomes in the same comparison. We found no other outcomes indicating substantial heterogeneity, however clinical heterogeneity may be present due to differences in the intervention affecting this outcome, for example length of intervention, frequency of flexibility intervention, type of programs, and sample sizes (i.e. Valim 2003 had a longer intervention of 20 weeks compared to the other studies by Bressan 2008 and Matsutani 2012, which both had interventions of 8 weeks in length; Valim 2003 administered a supervised program 3 times per week, whereas Bressan 2008 and Matsutani 2012 administered a home program with a frequency of 1 time per week; Valim 2003 has a total sample size of 60, whereas Bressan 2008 and Matsutani 2012 had sample sizes of 15 and 19, respectively). The analysis of depression showed absence of an effect postintervention for flexibility exercise training compared with aerobic exercise training (N = 94; MD −6.28, 95% CI −19.28 to 6.71; Figure 4). Relative change in the flexibility groups compared to the aerobic groups was 19.9% better (61% better to 21.2% worse).

Figure 4

Forest plot of comparison: 1 Flexibility vs aerobic (at end of intervention), outcome: 1.6 Depression, 0‐63, lower is best (end of intervention).

Tenderness (0‐to‐18 TP count, lower score means less tenderness, negative numbers mean improvement): Four trials assessed tenderness. Matsutani 2012, Richards 2002, and Valim 2003 used the tender point count, while McCain 1988 used the total myalgic score. The meta‐analysis presented evidence of no effect for flexibility exercise training when compared with aerobic exercise training postintervention (N = 253; standardised mean difference 0.20, 95% CI −0.08 to 0.48; Analysis 1.7). Relative change in the flexibility groups compared to the aerobic groups was 1.4% worse (0.6% better to 3.3% worse).

Improvement in pain greater than 30%: No studies reported data on this outcome.

Flexibility exercise training versus land‐based aerobic exercise training, long‐term effects

Only one study examined long‐term effects (follow‐up at 48 weeks, 36 weeks after end of 12‐week intervention) and provided data on HRQoL, pain intensity, fatigue, tenderness, and all‐cause withdrawals (Richards 2002). Data on stiffness, physical function, and adverse events were not measured at follow‐up (Analysis 1.9).

Major outcomes

Health‐related quality of life (self‐reported, FIQ Total, lower scores mean better health, negative numbers mean improvement): No evidence of an effect was found (N = 135; MD 0.40, 95% CI −5.01 to 5.81).

Pain intensity (self‐reported, 0‐to‐100 VAS, lower scores mean less pain, negative numbers mean improvement): No evidence of an effect was found (N = 136; MD 5.00, 95% CI −2.07 to 12.07).

Fatigue (self‐reported, 0‐to‐100 scale, lower scores mean less fatigue, negative numbers mean improvement):Richards 2002 measured fatigue using the Chadler Fatigue Scale, which was not one of our accepted measures, therefore this information was not included in the review.

Minor outcomes

Tenderness (0‐to‐18 TP count, lower score means less tenderness, negative numbers mean improvement): We found evidence of an effect between flexibility and aerobic exercise training favoring aerobic exercise training postintervention (N = 136; MD 2.40, 95% CI 0.66 to 4.14).

Improvement in pain greater than 30%: No studies reported data on this outcome.

Flexibility exercise training versus untreated control at the end of the intervention

Major outcomes

One study provided data for pain intensity, physical function, and all‐cause withdrawals (Assumpção 2017). We did not use the data provided for HRQoL, fatigue, and stiffness, which were described as skewed by Assumpção 2017. This study did not provide data on adverse events.

Health‐related quality of life (FIQ, SF‐36): One study provided data for the major outcome HRQoL (Assumpção 2017), but due to skewing of the data, only medians and interquartile ranges were provided. Although the researchers found within‐group improvements in the flexibility group in median total FIQ scores, between‐group differences were not statistically significant. The pre‐test median scores in the flexibility group of 66.3 points on a 100‐point scale dropped to 57.4, versus 73.6 points at pre‐test to 72.2 points postintervention in the untreated control group (P = 0.06).

Pain intensity (self‐reported, 0‐to‐100 VAS, lower scores mean less pain, negative numbers mean improvement): One study provided data for the major outcome of pain intensity (Assumpção 2017), and no statistically significant differences between groups were found (N = 28; MD −18.00, 95% CI −37.63 to 1.63; Analysis 2.1). Relative change in the flexibility group compared to the untreated control group was 30% better (2.7% worse to 62.7% better).

Fatigue (FIQ, SF‐36): One study provided data for the major outcome of fatigue (Assumpção 2017), but due to skewing of the data, only medians and interquartile ranges were provided. Although the researchers found within‐group improvements in the flexibility group in median FIQ fatigue scores, between‐group differences were not statistically significant. The pre‐test median scores in the flexibility group of 8.6 cm on a 10‐centimeter scale dropped to 7.8 cm at post‐test, versus 9.2 cm to 8.4 cm in the untreated control group (P = 0.07).

Stiffness (FIQ): Due to skewing of data, one study provided medians and interquartile ranges for stiffness (Assumpção 2017). The pre‐test median scores in the flexibility group of 8.3 cm on a 10‐centimeter scale dropped to 5.8 cm at post‐test, versus 9.2 cm to 9.0 cm in the untreated control group. Between‐group differences were not statistically significant.

Physical function (self‐reported, 0‐to‐100 FIQ, lower scores means fewer limitations, negative numbers mean improvement): One study provided data for the major outcome of physical function (Assumpção 2017), and no statistically significant differences between groups were found (N = 28; MD −3.33, 95% CI −16.29 to 9.63; Analysis 2.2). Relative change in the flexibility group compared to the untreated control group was 10.4% better (30.1% worse to 50.9% better).

Adverse events: No adverse event was reported by Assumpção 2017.

All‐cause withdrawal: Rates for the flexibility exercise training group (n1/N1) versus the untreated control group (n2/N2) were 4/18 versus 2/16 (Assumpção 2017). We found no significant difference in all‐cause withdrawal between flexibility exercise training and the untreated control group (RR 1.78, 95% CI 0.37 to 8.44; Analysis 2.3).

Minor outcomes

One study evaluated the effects of flexibility exercise training on the minor outcome of improvement in pain greater than 30% (Assumpção 2017). We did not use the data for tenderness and depression, which were described as skewed.

Depression (FIQ): Data were not used due to reported skewness.

Tenderness (TP count): Data were not used due to reported skewness.

Improvement in pain greater than 30%: Upon request, Assumpção 2017 provided data for the flexibility exercise training group, but not for the untreated control group.

Flexbility exercise training versus resistance training at the end of the intervention

Major outcomes

One study provided data for HRQoL (Jones 2002), three studies for pain intensity (Assumpção 2017; Gavi 2014; Jones 2002), two studies for fatigue (Gavi 2014; Jones 2002), and two studies for the major outcome of physical function (Gavi 2014; Jones 2002). Three studies provided data on all‐cause withdrawals (Assumpção 2017; Gavi 2014; Jones 2002). No study reported complete data on adverse events or measured stiffness.

Health‐related quality of life (self‐reported, FIQ Total, lower scores mean better health, negative numbers mean improvement): One study provided data for the major outcome HRQoL (Jones 2002); data showed no evidence of an effect of flexibility exercise training compared to resistance training (N = 56; MD 5.55, 95% CI −1.80 to 12.90; Analysis 3.1). Absolute change was 6% worse (2% better to 13% worse). Relative change in the flexibility group compared to the resistance group was 11.5% worse (27.4% worse to 3.8% better).

Pain intensity (self‐reported, 0‐to‐100 VAS, lower scores mean less pain, negative numbers mean improvement): Data on pain intensity were available for three studies (Assumpção 2017; Gavi 2014; Jones 2002). The meta‐analysis showed evidence of no effect for flexibility exercise training compared with resistance training (N = 152; MD 1.84, 95% CI −4.15 to 7.83; Analysis 3.2). Absolute change was 2% worse (4% better to 8% worse). Relative change in the flexibility groups compared to the resistance groups was 2.5% worse (11.1% worse to 5.9% better). There was no evidence of heterogeneity for this meta‐analysis (Tau² = 0.00; Chi² = 0.55, df = 2 (P = 0.76); I² = 0%).

Fatigue (self‐reported, 0‐to‐100 scale, lower scores mean less fatigue, negative numbers mean improvement): Two studies assessed fatigue as an outcome (Gavi 2014; Jones 2002). Assessment of statistical heterogeneity among trials indicated I² = 74% (i.e. 50% to 90%: substantial heterogeneity). Some of the clinical heterogeneity may be attributed to differences in the resistance training arm. The meta‐analysis showed evidence of no effect for flexibility exercise training versus resistance training postintervention (N = 122; MD 9.83, 95% CI −5.30 to 24.97; Analysis 3.3). Absolute change was 10% worse (5% better to 25% worse). Relative change in the flexibility groups compared to the resistance groups was 13.1% worse (30.8% worse to 6.54% better).

Physical function (self‐reported, 0‐to‐100 SF‐36, converted so that lower scores means fewer limitations, negative numbers mean improvement): Two studies assessed physical function as an outcome (Assumpção 2017; Gavi 2014). Assessment of statistical heterogeneity among studies indicated I² = 91% (i.e. 50% to 90%: substantial heterogeneity). Data were checked for accuracy (the SF‐36 scale was converted appropriately so that a lower score indicated improvement; the 0‐to‐30 FIQ scale was converted to a 0‐to‐100 scale). Given the very large degree of heterogeneity, we did not perform a meta‐analysis. Assumpção 2017 compared a 12‐week flexibility intervention (N = 14) versus resistance training (N = 16) and found an effect postintervention on physical function favoring the flexibility intervention (FIQ physical functioning; MD −16.66, 95% CI −28.87 to −4.45). Gavi 2014 compared a 16‐week flexibility intervention (N = 31) versus resistance training (N = 35) and found an effect postintervention on physical function favoring resistance training (SF‐36‐Physical capacity; MD 9.47, 95% CI 0.13 to 18.81).

Adverse events: Most studies did not measure adverse events, and other studies reported them incompletely, thus we are uncertain of the estimate. The statement "...arthrosis of the hip” is an adverse event that was reported to have occurred after flexibility exercise training (Gavi 2014), but it is unclear whether the arthrosis was a flare‐up related to participation in the intervention.

All‐cause withdrawal: Rates for flexibility exercise training groups (n1/N1) versus resistance training groups (n2/N2) were 4/18 versus 3/19 (Assumpção 2017); 9/31 versus 5/35 (Gavi 2014); and 6/28 versus 6/28 (Jones 2002). We found no evidence of effect on all‐cause withdrawal between flexibility exercise training and resistance training groups (RR 1.43, 95% CI 0.77 to 2.67; Analysis 3.8).

Minor outcomes

Two studies evaluated the effects of flexibility exercise training on the minor outcome of depression (Gavi 2014; Jones 2002), and one study evaluated the effects on tenderness (Jones 2002).

Depression (self‐reported, 0‐to‐100 FIQ, lower scores mean less depression, negative numbers mean improvement): Data on depression were available for two studies (Gavi 2014; Jones 2002). Data showed no evidence of an effect of flexibility exercise training compared with resistance training postintervention (N = 122; MD 0.47, 95% CI −3.40 to 4.35; Analysis 3.5). Relative change in the flexibility groups compared to the resistance groups was 1.8% worse (16.8% worse to 13.2% better).

Tenderness (0‐to‐18 TP count, lower score means less tenderness, negative numbers mean improvement): One trial assessed tenderness as an outcome (Jones 2002), showing no evidence of an effect of flexibility exercise training compared to resistance training postintervention (N = 56; MD −0.32, 95% CI −2.03 to 1.39; Analysis 3.6). Relative change in the flexibility group compared to the resistance group was 1.94% better (8.4% worse to 12.3% better).

Improvement in pain greater than 30%: One study evaluated improvement in pain greater than 30% (Assumpção 2017). Rates for the flexibility exercise training group (n1/N1) versus the resistance training group (n2/N2) were 5/14 and 6/16, respectively. We found no evidence of an improvement in pain greater than 30% between the flexibility exercise training and resistance training groups (odds ratio 0.93, 95% CI 0.21 to 4.11; Analysis 3.7).

Flexibility exercise training versus other interventions at the end of the intervention and long term

We did not pool studies as we did not consider interventions to be comparable across trials. Four studies provided data for this comparison (Altan 2009; Amanollahi 2013; Calandre 2009; López‐Rodríguez 2012). The comparisons were as follows:

flexibility exercise training versus Pilates (Altan 2009);
flexibility exercise training versus Tai Chi (Calandre 2009);
flexibility exercise training versus aquatic biodanza (López‐Rodríguez 2012); and
flexibility exercise training versus medication (i.e. ibuprofen) and flexibility exercise training versus friction massage (arm 3) (Amanollahi 2013).

Our analyses showed effect sizes on major and minor outcome variables for each of the included studies. Unless otherwise indicated, investigators measured HRQoL, pain, fatigue, and stiffness on a 0‐to‐100 scale, with lower scores best and negative numbers meaning improvement. Physical function was measured on a 0‐to‐3 scale, depression on a 0‐to‐63 scale, and tenderness on a 0‐to‐18 scale; lower scores are best, and negative numbers mean improvement. No studies reported data on improvement in pain greater than 30%. Four studies provided data on all‐cause withdrawals (Altan 2009; Amanollahi 2013; Calandre 2009; López‐Rodríguez 2012). Data on adverse events were available from Altan 2009 and Amanollahi 2013, but not always for both study arms.

Flexibility exercise training versus Pilates

End of intervention

Altan 2009 compared a 12‐week program of flexibility exercise training (described as "home exercise relaxation and stretching") (n = 25) versus Pilates (n = 25). We found evidence of an effect postintervention favoring Pilates for both HRQoL (FIQ Total, N = 49; MD 14.00, 95% CI 2.50 to 25.50; Analysis 4.1) and pain intensity (VAS; N = 49; MD 19.00, 95% CI 8.28 to 29.72; Analysis 4.2). Altan 2009 found no between‐group differences postintervention in tenderness (TP count; N = 49; MD 0.90, 95% CI −1.39 to 3.19) and reported no adverse events (i.e. injuries, exacerbations, or other) in either group. ("We observed no adverse effect of Pilates exercises.") There was no mention of adverse events in the control group (flexibility exercise training and relaxation). All‐cause withdrawal rates for the flexibility exercise training group (n1/N1) versus the Pilates group (n2/N2) were 1/24 versus 0/25.

Long term

Altan 2009 provided follow‐up data 12 weeks after the end of a 12‐week intervention for HRQoL, pain intensity, and tenderness. We found no evidence of a difference between groups for HRQoL (N = 49; MD 8.3, 95% CI −4.84 to 21.4) or tenderness (N = 49; MD 1.1, 95% CI −0.97 to 3.17). However, we found evidence of an effect on pain intensity favoring Pilates (N = 49; MD 13, 95% CI 0.09 to 25.91; Analysis 4.9).

Flexibility exercise training versus Tai Chi

End of intervention

Calandre 2009 compared a 6‐week flexibility intervention (in water) (N = 39) versus Tai Chi (in water) (N = 42). We found no evidence of an effect on HRQoL (FIQ Total; N = 81; MD 3.80, 95% CI −2.89 to 10.49; Analysis 4.1); pain intensity (VAS; N = 81; MD 0.00, 95% CI −9.58 to 9.58; Analysis 4.2); fatigue (FIQ VAS; N = 81; MD 3.00, 95% CI −6.83 to 12.83; Analysis 4.3); stiffness (FIQ VAS; N = 81; MD 6.00, 95% CI −5.33 to 17.33; Analysis 4.4); depression (Beck Depression Inventory; N = 81; MD −0.10, 95% CI −2.72 to 2.52; Analysis 4.6); or tenderness (TP count; N = 81; MD −0.50, 95% CI −1.98 to 0.98; Analysis 4.7). Adverse events were not measured for the flexibility group. However, three participants in the Tai Chi group dropped out, two due to “pain exacerbation” and one due to "chlorine hypersensitivity."

Long term

Calandre 2009 provided follow‐up data 12 weeks after the end of the 6‐week intervention for HRQoL, pain intensity, fatigue, stiffness, depression, and tenderness. We found no evidence of effect between groups in HRQoL (N = 81; MD 2.3, 95% CI −3.69 to 8.29); pain intensity (N = 81; MD −2, 95% CI −11.59 to 7.59); fatigue (N = 81; MD 2, 95% CI −5.62 to 9.62); stiffness (N = 81; MD 0.0, 95% CI −9.37 to 9.37); depression (N = 81; MD −0.31, 95% CI −4.40 to 3.78); and tenderness (N = 81; MD 0.0, 95% CI −1.54 to 1.54). All‐cause withdrawal rates for the flexibility exercise training group (n1/N1) versus the Tai Chi group (n2/N2) were 5/39 versus 10/42 (RR 0.54, 95% CI 0.20 to 1.44; Analysis 4.9).

Flexibility exercise training versus aquatic biodanza

End of intervention

López‐Rodríguez 2012 compared a 12‐week flexibility intervention (N = 20) versus aquatic biodanza (N = 19). We found evidence of an effect favoring aquatic biodanza postintervention on HRQoL (FIQ Total; N = 39; MD 17.07, 95% CI 7.86 to 26.28; Analysis 4.1); fatigue (FIQ VAS; N = 39; MD 11.40, 95% CI 1.09 to 21.71; Analysis 4.3); and stiffness (FIQ VAS; N = 39; MD 14.00, 95% CI 2.68 to 25.32; Analysis 4.4). López‐Rodríguez 2012 did not find between‐group differences postintervention in physical function (FIQ Activities of Daily Living), 0‐to‐3‐millimeter scale; N = 39; MD 0.37, 95% CI 0.05 to 0.69; Analysis 4.5) or depression (Beck Depression Inventory; N = 39; MD 0.65, 95% CI −3.79 to 5.09; Analysis 4.6). One participant in the flexibility group dropped out of the study due to "worsening of symptom with the training" (information obtained from correspondence with author). No adverse events were reported for the aquatic biodanza group. All‐cause withdrawal rates for the flexibility exercise training group (n1/N1) versus the aquatic biodanza group (n2/N2) were 15/35 versus 16/35 (RR 0.94, 95% CI 0.55 to 1.59).

Long term

Long‐term effects were not investigated.

Flexibility exercise training versus friction massage

End of intervention

Amanollahi 2013 compared a 4‐week flexibility intervention (N = 45) versus friction massage (N = 45). We found evidence of an effect on pain intensity postintervention favoring flexibility (VAS; N = 90; MD −28.00, 95% CI −40.84 to −15.16; Analysis 4.2). Four participants (7%) in the flexibility exercise training group and 11 participants (22.6% ) in the friction massage group reported an increase in pain levels. All‐cause withdrawal rates for the flexibility exercise training group (n1/N1) versus the friction massage group (n2/N2) were 0/45 versus 0/45 (RR not estimable).

Long term

Long‐term effects were not investigated.

Flexibility exercise training versus medication (ibuprofen)

End of intervention

Amanollahi 2013 compared a 4‐week flexibility exercise intervention (N = 45) versus medication (ibuprofen) (N = 45). We found no evidence of an effect on pain intensity (VAS; N = 90; MD −8.00, 95% CI −20.21 to 4.21; Analysis 4.2). Five participants in the medication intervention group reported side effects to the ibuprofen medications. All‐cause withdrawal rates for the flexibility exercise training group (n1/N1) versus the medication group (n2/N2) were 6/45 versus 0/45 (RR 13.00, 95% CI 0.75 to 224.13).

Long term

Long‐term effects were not investigated.

Discussion

This review is one of a series of reviews examining the effects of physical activity interventions for adults with fibromyalgia; this review focused on flexibility exercise training.

Summary of main results

Twelve unique studies involving 743 people met our inclusion criteria. The comparisons were as follows.

Flexibility exercise training versus untreated controls. One study involving 28 participants compared flexibility exercise training versus control. Results showed no evidence of an effect on pain intensity, physical function, improvement in pain greater than 30%, or all‐cause withdrawals. Health‐related quality of life, fatigue, and stiffness were not analyzed as data were reported as being skewed. No long‐term effects were investigated. The overall certainty of the evidence was low.
Flexibility exercise training versus land‐based aerobic exercise training. Five studies involving a total of 266 participants compared flexibility exercise training versus aerobic exercise training. Although we found evidence of an effect favoring the flexibility exercise group for stiffness (one study), we found no evidence of an effect on HRQoL, pain intensity, fatigue, physical function, all‐cause withdrawal, depression, or tenderness. When evaluating long‐term effects, we found evidence of an effect of aerobic exercise on tenderness. The overall certainty of the evidence was very low.
Flexibility exercise training versus resistance training. Three studies involving 152 participants compared flexibility exercise training to resistance training. We found no evidence of an effect for pain intensity, fatigue, depression, all‐cause withdrawal, HRQoL, physical function, tenderness, or improvement in pain greater than 30%. Stiffness was not measured. No long‐term effects were investigated in any of the studies. The overall certainty of the evidence was low to very low.
Flexibility exercise training versus other interventions. Four studies involving 299 participants compared flexibility exercise training versus other interventions. Three of these studies had two parallel arms, and one had three parallel arms. Owing to the differences between interventions and comparators, data were not pooled. In between‐group comparisons within single studies comparing flexibility exercise training to a) Pilates, we found evidence of an effect of Pilates on HRQoL and pain intensity, but no evidence of an effect on tenderness; b) Tai Chi, we found no evidence of an effect on HRQoL, pain intensity, fatigue, stiffness, depression, or tenderness; c) aquatic biodanza, we found evidence of an effect of aquatic biodanza on HRQoL, fatigue, and stiffness, but no evidence of an effect on physical function or depression; d) medications, we found no evidence of an effect on pain intensity; and e) friction massage, we found evidence of an effect of flexibility exercise training on pain intensity. These results must be interpreted with caution due to the risk of bias resulting from methodological weaknesses. We assessed the certainty of the evidence for this comparison as very low.

Overall completeness and applicability of evidence

Samples recruited by the included studies consisted mainly of women 35 to 55 years old. Although some men were included, we were unable to calculate a precise number due to lack of information. The 12 included studies were conducted in seven different countries from Europe and North and South America. However, four of the included studies were from Brazil, and the authors of these four studies, Assumpção 2017, Bressan 2008, Matsutani 2012, and Valim 2003 may belong to a joint research group as they are co‐authors on each other's studies. Our findings are thus not easily generalizable beyond middle‐aged, largely Caucasian (understood to be white), female populations. Sample sizes were small, and pooled samples were still less than the 400 criterion, therefore we recommend caution in generalizing results of this review to the wider population of individuals with fibromyalgia.

Flexibility exercises are often embedded in programs targeting individuals with fibromyalgia within the context of current practice; however, in some instances flexibility exercises may be integrated into the warm‐up and/or cool‐down regimens rather than being treated as a separate treatment intervention. In our review, some researchers employed flexibility exercises as a control, Altan 2009; López‐Rodríguez 2012, or as part of a relaxation intervention, Richards 2002, which may further underscore the lack of recognition of flexibility exercise training as a unique treatment on its own. It is thus plausible that we may have captured only some of the published papers on flexibility exercise and fibromyalgia.

The duration of the flexibility exercise training sessions ranged from 40 to 60 minutes and were a mixture of (unsupervised) home‐based programs and supervised group sessions. The flexibility interventions in this review did not meet all recommended FITT (frequency, intensity, time, and type) principles for flexibility exercise training for healthy individuals (see Table 1 and Table 4) (ACSM 2013). Consequently, the benefits of flexibility exercise training may be underestimated in these studies.

According to the 2013 ACSM guidelines for healthy adults (ACSM 2013), the recommended frequency for flexibility training regimens is two to three days per week, with daily being more effective. None of the included studies had a frequency more than three days per week, and ranged from one to three days per week, with frequency fixed throughout the program. Regarding the intensity of the flexibility exercise training program, the 2013 ACSM guidelines recommend the stretch to be taken to the point of tightness or slight discomfort. Eleven of the 12 included studies did not provide information on the intensity of their programs, thus making judgement difficult. The 2013 ACSM guidelines recommend holding the stretch for 10 to 30 seconds. Seven of the 12 included studies met the recommended time for holding each stretch; in four studies this was unclear; and one study did not meet the recommended hold. For type of flexibility exercise, the 2013 ACSM guidelines recommend a series of flexibility exercises for each of the major muscle‐tendon units with static, dynamic, ballistic, and proprioceptive neuromuscular facilitation (PNF) all stated as being effective. Most studies met this criteria, with only one study providing insufficient information to permit a judgement. For volume and pattern, the guidelines suggest that a reasonable target is to perform 60 seconds of total stretching for each flexibility exercise with each stretch repeated two to four times. Only three studies met the recommended guidelines, with the remaining studies providing insufficient information to permit a judgement.

Quality of the evidence

The evidence presented in this review was obtained from trials published in academic journals, registered and published RCT protocols, and trial author responses to requests for information. Using the GRADE system of rating evidence for major outcomes, we judged the overall certainty of evidence for the comparison of flexibility exercise training versus the land‐aerobic exercise training to be very low after downgrading due to issues related to selection and performance bias, and potential limitations related to inconsistency (i.e. heterogeneity of interventions) or imprecision (i.e. total cumulative sample size lower than 400). The sample sizes of the included trials were often small, and even after pooling the data in the meta‐analysis, participant numbers were smaller than desired. In some trials, flexibility was used as a proxy (i.e. flexibility exercise training was used as the control or combined with relaxation), making judgements on the benefits of flexibility exercise training challenging. The available evidence is limited by the number and quality of the included trials, preventing us from reaching robust conclusions regarding the benefits and harms of flexibility exercise training for adults with fibromyalgia. We cannot offer a thorough understanding of adverse effects from flexibility exercise training due to the lack of information provided in the included studies. We found that withdrawal rates did not differ between flexibility and aerobic training. We rated the certainty of the evidence as very low for long‐term benefits of flexibility exercise for HRQoL and pain intensity after downgrading for selection bias, indirectness (i.e. flexibility was used along with relaxation as the control), and imprecision (i.e. small number of participants) (see Table 5). For the comparison of flexibility exercise training versus aerobic exercise training, we are thus uncertain whether flexibility exercise training leads to improvements in HRQoL, pain intensity, fatigue, stiffness, and physical function or decreases withdrawals and adverse events because the certainty of the evidence is very low.

For the comparison of flexibility exercise training versus untreated control, there was only one study and the overall certainty of the evidence was low for the measured outcomes (pain intensity and physical function). Selection and performance bias issues as well as imprecision (i.e. total cumulative sample size lower than 400) led to downgrading of the evidence (see Table 6). Withdrawal rates did not differ between flexibility exercise training and untreated control. Consequently, flexibility exercise training may lead to little or no difference in pain intensity, physical function, and withdrawals.

For the comparison of flexibility exercise training versus resistance training, we found similar issues to the comparison of flexibility exercise training versus aerobic exercise training, which led to downgrading of the evidence for major outcomes (HRQoL, pain intensity, fatigue, and physical function) to low to very low certainty (see Table 7). For this comparison one study reported on the outcome of greater than 30% improvement of pain. The certainty of evidence was low owing to selection and performance bias and small sample size. Flexibility may thus lead to little or no difference in improvement of HRQoL, pain intensity, and pain greater than 30%. We are uncertain whether flexibility improves fatigue and physical function and decreases withdrawals and adverse events because the certainty of the evidence is very low.

For the comparison of flexibility exercise training versus other interventions, the certainty of evidence ranged from low to very low for HRQoL, pain intensity, fatigue, stiffness, and physical function. We downgraded the certainty of the evidence owing to issues related to risk of bias (selection and performance bias), imprecision (small number of participants), and heterogeneity of the interventions (see Table 8). Flexibility may thus lead to little or no difference in physical function, and it is unclear whether flexibility improves HRQoL, pain intensity, fatigue, and stiffness and decreases withdrawals and adverse events because the certainty of the evidence is very low.

Potential biases in the review process

We attempted to control for bias in the review process in the following ways.

We followed our protocol and documented any deviations from it and reasons for the deviations. We strove for transparency in our decisions and procedures.
We applied no language restrictions on our search.
We described inclusion criteria in sufficient detail to avoid inconsistent application in study selection and documented the inclusion criteria. We updated searches periodically and utilized multiple databases.
By searching clinical trial registries (e.g. ClinicalTrials.gov), we enhanced the opportunity to identify unpublished trials and selective reporting of outcomes. Publication bias may lead to overestimation of treatment effect by up to 12%.
We contacted primary authors for clarification and additional information where indicated, although responses were not always obtained. We asked our questions in open‐ended fashion to avoid leading questions or answers.
Our team includes multidisciplinary views and range of expertise, which co‐create the synthesis of the evidence: our views include library science, systematic reviews and methods, critical appraisal, clinical rheumatology, exercise physiology, physiotherapy, kinesiology, and knowledge translation and lived experience (i.e. consumers).
We used a standardized procedure to determine selection and inclusion and assessment of studies in the review, and review authors were trained in data extraction.
Two members of our multidisciplinary team presented the perspective of consumers (i.e. one team member had fibromyalgia and another team member had another rheumatic disease) and brought the perspective of lived experience during the protocol and review process.
We used intention‐to‐treat data preferentially.

Agreements and disagreements with other studies or reviews

We found one previous review on flexibility exercise for fibromyalgia (Lorena 2015). The search for the Lorena 2015 review generated five RCTs published between 1986 and 2010. These five studies were assessed for methodological quality using the PEDro scale, which led to one study, Bressan 2008, being excluded for low methodological quality (PEDro scale = 2). Lorena 2015 performed no meta‐analysis. One of the four studies included in Lorena 2015 was a thesis at the time of their review (Assumpcao 2010); it has subsequently been published and is included in our review (Assumpção 2017).

All four studies included in Lorena 2015 were included in our review (with the thesis by Assumpcao 2010 being a companion study to Assumpção 2017 (confirmed by thesis author)). Similar to our review, Lorena 2015 observed a greater concentration of studies investigating flexibility in adults with fibromyalgia after the year 2000. We agree with their general conclusions on the flexibility intervention parameters: flexibility training parameters were poorly described with heterogeneity in the time, frequency, and intensity of sessions between studies. We also agree with the statement by Lorena 2015 that there is a “need for further studies to establish the real benefits of the technique, because the majority of published studies shows low methodological quality.”

In contrast to our review, Lorena 2015 assessed methodological quality by the PEDro scale (we used the Cochrane ‘Risk of bias’ tool). Lorena 2015 states that all studies demonstrated improvement in pain intensity, as well as quality of life and physical condition, however our meta‐analyses for pain intensity do not support this. In addition, their review only included McCain 1986, which is a preliminary summary for McCain 1988; results from the later study, McCain 1988, were based on a larger sample for the flexibility group. Our study included data from McCain 1988 (we treated McCain 1986 as a companion study in our review). Matsutani 2012 was included in our meta‐analysis for studies comparing flexibility and aerobic training, and Jones 2002 and Assumpcao 2010 (companion study of Assumpção 2017) were included in our meta‐analysis for studies comparing flexibility and resistance training. Based on our results, the absolute changes and relative improvements show no evidence of an effect for the flexibility groups.

Theadom 2015 conducted a systematic review examining mind‐and‐body therapy for fibromyalgia. Theadom 2015 categorized their 61 included studies into five broad groups: psychological therapies, biofeedback, mindfulness meditation therapies, movement therapies, and relaxation‐based therapies. Two of the 11 included studies within their movement therapies category, which included interventions such as yoga, Tai Chi, and Pilates, were included in our review (Altan 2009; Calandre 2009). In agreement with our review, physical function and pain intensity were used as major outcomes in Theadom 2015. However, fatigue and quality of life were used as their minor outcomes (these were included among our major outcomes). Also in agreement with our review, the authors of Theadom 2015 found very low‐certainty evidence for studies investigating the effects of movement therapies, with trial quality being reduced by unclear details or high risk of allocation concealment and non‐blinding of outcome assessors.

There are several interdisciplinary guidelines on the management of fibromyalgia, that are from Europe (EULAR; European League Against Rheumatism) (Macfarlane 2017), Canada (Fitzcharles 2013), Israel (Ablin 2013), and Germany (Arnold 2012; Langhorst 2012; Winkelmann 2012). The most recently revised recommendations are from the EULAR (Macfarlane 2017). Although specific recommendations for flexibility exercise training are not provided, the authors state that the EULAR recommendations are in agreement with recommendations from other countries on the principles of approach to management. They state that there needs to be emphasis on therapy tailored to the individual, and that non‐pharmacological therapies should play a first‐line of treatment role. As flexibility exercise training is further studied with larger trials in this population, future treatment guidelines may begin to discuss the possible benefits of flexibility training.

Previous Cochrane Reviews of aerobic and resistance training for adults with fibromyalgia identified evidence of an effect associated with exercise training in comparison to controls (Bidonde 2017; Busch 2013). Given that in this review only one study permitted us to evaluate the effects of flexibility training compared to control, these previous reviews could serve as a benchmark which we can use to establish the effects of flexibility. A previous Cochrane Review of aerobic training for fibromyalgia identified evidence of an effect between aerobic training and controls on HRQoL, pain intensity, stiffness, and physical function (Bidonde 2017). If in our review, we found no evidence of an effect between flexibility exercise training and aerobic exercise training for similar outcomes (e.g. HRQoL, pain intensity, and physical function), it may be plausible that flexibility training may also lead to improvement in these same outcomes compared to controls.

Figure 1

Study flow diagram.

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Figure 4

Forest plot of comparison: 1 Flexibility vs aerobic (at end of intervention), outcome: 1.6 Depression, 0‐63, lower is best (end of intervention).

Analysis 1.1

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention).

Analysis 1.2

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 2 Pain, Intensity, 0‐100, lower is best (end of intervention).

Analysis 1.3

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 3 Fatigue, 0‐100, lower is best (end of intervention).

Analysis 1.4

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 4 Stiffness, 0‐100, lower is best (end of intervention).

Analysis 1.5

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 5 Physical function, 0‐100, lower is best (end of intervention).

Analysis 1.6

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 6 Depression, 0‐100, lower is best (end of intervention).

Analysis 1.7

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 7 Tenderness 0‐18, lower is best (end of intervention).

Analysis 1.8

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 8 Withdrawals.

Analysis 1.9

Comparison 1 Flexibility versus aerobic (end of intervention), Outcome 9 Long‐term effects.

Analysis 2.1

Comparison 2 Flexibility versus control (end of intervention), Outcome 1 Pain, Intensity, 0‐100, lower is best (end of intervention).

Analysis 2.2

Comparison 2 Flexibility versus control (end of intervention), Outcome 2 Physical function, 0‐100, lower is best (end of intervention).

Analysis 2.3

Comparison 2 Flexibility versus control (end of intervention), Outcome 3 Withdrawals.

Analysis 3.1

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention).

Analysis 3.2

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 2 Pain, Intensity, 0‐100, lower is best (end of intervention).

Analysis 3.3

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 3 Fatigue, 0‐100, lower is best (end of intervention).

Analysis 3.4

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 4 Physical function, 0‐100, lower is best (end of intervention).

Analysis 3.5

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 5 Depression, 0‐63, lower is best (end of intervention).

Analysis 3.6

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 6 Tenderness, 0‐18, lower is best (end of intervention).

Analysis 3.7

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 7 > 30% improvement of pain (end of intervention).

Analysis 3.8

Comparison 3 Flexibility versus resistance (end of intervention), Outcome 8 Withdrawals.

Analysis 4.1

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention).

Analysis 4.2

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 2 Pain, Intensity, 0‐100, lower is best (end of intervention).

Analysis 4.3

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 3 Fatigue, 0‐100, lower is best (end of intervention).

Analysis 4.4

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 4 Stiffness, 0‐100, lower is best (end of intervention).

Analysis 4.5

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 5 Physical function, 0‐100, lower is best (end of intervention).

Analysis 4.6

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 6 Depression, 0‐63, lower is best (end of intervention).

Analysis 4.7

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 7 Tenderness, 0‐18, lower is best (end of intervention).

Analysis 4.8

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 8 Withdrawals.

Analysis 4.9

Comparison 4 Flexibility versus other comparators (end of intervention), Outcome 9 Long‐term effects: flexibility vs other comparators.

Summary of findings for the main comparison. Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia

Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia
Patient or population: adults with fibromyalgia Settings: group and home program Intervention: flexibility exercise training Comparison: aerobic training Outcome: measured at the end of intervention
Outcomes	*Anticipated absolute effects^ (95% CI)**		Relative effect (95% CI)	№ of participants (studies)	Certainty of the evidence (GRADE)	Comments
Outcomes	Risk with aerobic (end of intervention)	Risk with flexibility	Relative effect (95% CI)	№ of participants (studies)	Certainty of the evidence (GRADE)	Comments
Health‐related quality of life assessed with: FIQ Total (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 12 weeks to 20 weeks⁵	Mean health‐related quality of life was 42 mm.	Mean 4.14 mm higher (5.77 lower to 14.05 higher)	‐	193 (2 RCTs)	⊕⊝⊝⊝ VERY LOW^1,2,3,4	Absolute change was 4% worse (6% better to 14% worse). Relative change⁷ in the flexibility groups compared to the aerobic groups was 7.53% worse (10.5% better to 25.5% worse). NNTB n/a⁶
Pain intensity assessed with: VAS (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks⁸	Mean pain intensity was 52 mm.	Mean 4.72 mm higher (1.39 lower to 10.83 higher)	‐	266 (5 RCTs)	⊕⊝⊝⊝ VERY LOW^1,3,4	Absolute change was 5% worse (1% better to 11% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.7% worse (2% better to 15.4% worse).⁷ NNTB n/a⁶
Fatigue assessed with: FIQ and SF‐36 converted (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks⁹	Mean fatigue was 71 mm.	Mean 4.12 mm lower (13.31 lower to 5.06 higher)	‐	75 (2 RCTs)	⊕⊝⊝⊝ VERY LOW^1,4	Absolute change was 4% better (13% better to 5% worse). Relative change in the flexibility groups compared to the aerobic groups was 6.02% better (19.4% better to 7.4% worse).⁷ NNTB n/a⁶
Stiffness assessed with: FIQ (0 is best) 0‐to‐100‐millimeter scale Follow‐up: 8 weeks¹⁰	Mean stiffness was 79 mm.	Mean 29.6 mm lower (51.47 lower to 7.73 lower)	‐	15 (1 RCT)	⊕⊝⊝⊝ VERY LOW^4,11	Absolute change was 30% better (8% better to 51% better). Relative change in the flexibility group compared to the aerobic group was 39% better (10% better to 68% better).⁷ NNTB n/a⁶
Physical function assessed with: FIQ and SF‐36 converted (0 is best) 0‐to‐100‐millimeter scale Follow‐up: range 8 weeks to 20 weeks¹²	Mean physical function 17 units.	Mean 6.04 units higher (3.95 lower to 16.03 higher)	‐	60 (1 RCT)	⊕⊝⊝⊝ VERY LOW^1,4	Absolute change was 6% worse (4% better to 16% worse). Relative change in the flexibility group compared to the aerobic group was 13.97% worse (9.1% better to 37.1% worse).⁷ NNTB n/a⁶
Withdrawals All‐cause attrition Follow‐up: 8 to 20 weeks	Study population		RR 0.97 (0.61 to 1.55)	301 (5 RCTs)	‐	Absolute change was 1% fewer withdrawals in the flexibility groups (8% fewer to 21% more). Relative change in the flexibility group was 3% fewer (39% fewer to 55% more).
Withdrawals All‐cause attrition Follow‐up: 8 to 20 weeks	19 per 100	18 per 100 (11 to 29)	RR 0.97 (0.61 to 1.55)	301 (5 RCTs)	‐
Adverse events—increase in symptoms, injuries, or serious adverse events	Studies did not measure or report events.	Not all studies measured or reported events.	‐	No reliable estimate	⊕⊝⊝⊝ VERY LOW^1,4	In 1 of the 5 studies, 1 participant in the flexibility group was reported as having a minor adverse event. The following statement was provided: "a patient in the FLEX group had tendinitis of the Achilles tendon, which responded to treatment with local heat and a reduction in exercise for 14 days" (McCain 1988; page 1138). However, it is unclear whether the tendinitis was related to intervention participation.
*The risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; FIQ: Fibromyalgia Impact Questionnaire; NNTB: number needed to treat for an additional beneficial outcome; NNTH: number needed to treat for an additional harmful outcome; RCT: randomized controlled trial; RR: risk ratio; SF‐36: 36‐item Short Form Health Survey; VAS: visual analogue scale
GRADE Working Group grades of evidence High certainty: We are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low certainty: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹Downgraded two levels due to risk of bias (e.g. selection and performance bias). ²Downgraded one level due to inconsistency (i.e. heterogeneity among trials found). ³Downgraded two levels because flexibility was used as a proxy (i.e. flexibility exercise was used along with relaxation as the control in the study). ⁴Downgraded one level due to imprecision (sample size lower than 400 rule‐of‐thumb). ⁵Study authors: Richards 2002; Valim 2003. ⁶NNTB or NNTH was not calculated, as there were no clinically important between‐group differences. ⁷Relative change calculation as per Cochrane Musculoskeletal Review Group procedures: absolute change divided by the baseline mean of the highest‐weighted aerobic group. Richards 2002 (value was 55 on a 0‐to‐100‐point scale on the FIQ for health‐related quality of life, and 70.4 on a 0‐to‐100‐point scale on the VAS for pain). Valim 2003 (value was 68.4 points on a 0‐to‐100‐point scale on the SF‐36 Vitality for fatigue, and 43.23 on a 0‐to‐100‐point scale on the SF‐36 for function). Bressan 2008 (value was 75.7 points on a 0‐to‐100‐point scale on the FIQ for stiffness). ⁸Study authors: Bressan 2008; Matsutani 2012; McCain 1988; Richards 2002; Valim 2003. ⁹Study authors: Bressan 2008; Valim 2003. ¹⁰Study author: Bressan 2008. ¹¹Downgraded one level for possible selection and performance bias. ¹²Study author: Valim 2003.

Summary of findings for the main comparison. Flexibility exercise training compared with aerobic exercise training for adults with fibromyalgia

Table 1. FITT‐VP parameters

Author, year, intervention	Frequency, times per week	Length in weeks	Intensity	Time/duration	Session, minutes	Type/mode	Pattern
Flexibility versus control
Assumpção 2017	2 times/week	12 weeks	Stretch intensity was increased gradually to the point of moderate discomfort.	30 s	40 min	Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis)	Not mentioned
Flexibility versus aerobic
Bressan 2008	1 time/week	8 weeks	Not mentioned	30 s	40 to 45 min	Static muscular stretching of the triceps surae, ischiotibial, gluteal, paravertebral, latissimocondyloideus, pectoral, trapezius, and respiratory muscles. Stretching was performed in dorsal decubitus or sitting.	Performed in a series of 5 repetitions
Matsutani 2012	1 time/week	8 weeks	Not mentioned	30 s	45 min	All exercises emphasized breathing and postural alignment corrections.	For each exercise there were 4 replications, holding the stretch for 30 s on each repetition, followed by 30 s of rest.
McCain 1988	3 times/week	20 weeks	Not mentioned	Not mentioned	60 min	Exercise consisted of flexibility maneuvers such that sustained heart rate responses greater than 115 beats per minute.	Not mentioned
Richards 2002	2 times/week	12 weeks	Not mentioned	Not mentioned	60 min	Relaxation and flexibility comprised upper and lower limb stretches and relaxation techniques based on the published regimen by Ost 1987.	Not mentioned
Valim 2003	3 times/week	20 weeks	Not mentioned	30 s	45 min	Stretching program included 17 exercises using both muscles and joints in a general way, including face, cervical, trunk, and extremities.	Not mentioned
Flexibility versus resistance
Assumpção 2017	2 times/week	12 weeks	Stretch intensity was increased gradually to the point of moderate discomfort.	30 s	40 min	Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis).	Not mentioned
Gavi 2014	2 times/week	16 weeks	Not mentioned	30 s	45 min	Stretching program included major muscle groups. Authors reference the stretching protocol used by Valim 2003.	Not mentioned
Jones 2002	2 times/week	12 weeks	Not mentioned	60 s	60 min	Stretching program included stretches performed in standing, sitting, or lying positions.	Not mentioned
Flexibility versus other
Altan 2009	3 times/week	12 weeks	Not mentioned	6 s	60 min	Non‐weight bearing stretching of cervical, shoulder, thoracic, lumbar, gluteal leg and crusis muscle	Not mentioned
Amanollahi 2013	3 times/week	4 weeks	Not mentioned	30 s	Not mentioned	Non‐weight bearing stretching of shoulders blade musculature, paraspinal muscles, neck and low back muscle, hamstrings and calf muscles	Each time included 3 repetitions of each stretching exercise
Calandre 2009	3 times/week	6 weeks	Not mentioned	Not mentioned	60 min	Stretching exercises were performed on muscles over the main body area: cervical, upper and lower groups extremities, and trunk.	Not mentioned
López‐Rodríguez 2012	2 times/week	12 weeks	Not mentioned	Not mentioned	60 min	Flexibility stretching exercises that included global stretches and specific to different muscular areas of the body	Not mentioned

Table 1. FITT‐VP parameters

Table 2. Outcome measures used for analysis in the included studies

Outcome	Name of instrument or index/subscale
Health‐related quality of life	FIQ Total¹ (0 to 100)
Pain intensity	Current pain (VAS), FIQ pain¹ (VAS), SF‐36 bodily pain
Fatigue	FIQ fatigue¹ (0 to 100), SF‐36 Vitality (0 to 100)
Stiffness	FIQ stiffness¹ (0 to 100)
Physical function	FIQ physical function¹ (0 to 100), SF‐36
Depression	Beck Depression Inventory (0 to 63), FIQ depression¹ (0 to 100)
Tenderness	Tender point count (0 to 18), total myalgic score
Adverse events	Not a standardized instrument or index/narrative information
FIQ: Fibromyalgia Impact Questionnaire; SF‐36: 36‐item Short Form Health Survey; VAS: visual analogue scale ¹The revised FIQ scale, Bennett 2009, and any language‐translated version of the FIQ (Portuguese version; Assumpção 2017) were considered to be equivalent to the original version of the FIQ (Burckhardt 1991).

Table 2. Outcome measures used for analysis in the included studies

Table 3. Detailed description of exercise protocol

Study	Group (naming of the intervention as described by author)	Flexibility	Aerobic	Strength	Other
Altan 2009	Length: 24 weeks 1. HOME EXERCISE 1 h, 3/week RELAXATION/STRETCHING 1 h, 3/week 2. PILATES 1 h, 3/week	1. Muscle groups/exercises: stretching of cervical, shoulder, thoracic, lumbar, gluteal, leg and cruris muscle groups. Holding each stretch for 6 s and relaxed for 4 s 2. None	1. None 2. None	1. None 2. The protocol comprised 9 modules covering postural education, search for neutral position, sitting exercise, antalgic exercises, and breathing education. Equipment: resistance bands and 26‐centimeter Pilates balls were used as supportive equipment. The following components were included in the exercises: resistance and stabilization, flexibility and range of motion, proper body alignment, balance, co‐ordination, and body awareness. 1‐hour program (5 min breathing, 10 min warm‐up, 35 min conditioning, 10 min cool‐down)	1. None 2. None
Amanollahi 2013	Length: 4 weeks 1. FLEXIBILITY 3/week 2. MEDICATION 3/day and 1/day 3. FRICTION MASSAGE 3/week	1. Static and non‐weight‐bearing stretching of shoulders blade musculature, paraspinal muscles, neck and low back muscle, hamstrings and calf muscles. 3 reps with 30 s holds 2. None 3. None	1. None 2. None 3. None	1. None 2. None 3. None	1. None 2. 400 mg ibuprofen (Aria Pharmaceutical Co., Iran) 3 x/day and 25 mg nortriptyline (Darou Pakhsh Pharmaceutical Mfg. Co, Iran) 1/day 3. 3 30‐second friction massages using the second and third fingers with a pressure of approximately 0.5 to 1 kg/point on the painful spot so that a mild pallor occurred on the practitioner’s nails
Assumpção 2017	Length: 12 weeks 1. FLEXIBILITY 2/week 2. RESISTANCE 2/week 3. CONTROL	1. Supervised program focusing on large muscles (triceps surae, gluteus, ischiotibial, paravertebral, latissimus dorsi, hip adductor, pectoralis). In early stages 3 reps, from fifth week 4 reps, from ninth week 5 reps; intensity of stretch was gradually increased to point of moderate discomfort and held for 30 s holds for 40 min. 2. None 3. None	1. None 2. None 3. None	1. None 2. Dumbells for upper limbs and shin pads for lower limbs; exercises targeted triceps surae, quadriceps, hip adductors and abductors, hip flexors, elbow flexors and extensors, pectoralis major, and rhomboids. Duration of 40 min (5 min breathing, 10 min warm‐up, 35 min conditioning, 10 min cool‐down); first 2 sessions there was no load; 0.5 kg was added each week if participant identified the effort as slightly intense on the Borg Scale (score = 13); 8 reps 3. None	1. None 2. None 3. None
Bressan 2008	Length: 8 weeks 1. STRETCHING 1/week 2. PHYSICAL CONDITIONING EXERCISES 1/week	1. Static stretches of triceps surae, ischiotibial, gluteal, paravertebral, latissimocondyloideus, pectoral, trapezius, and respiratory muscles. In addition, stretching at home was recommended. Exercises were performed in a series of 5 repetitions, with 30 s holds for 40 to 45 min. 2. None	1. None 2. Walking for a period of 30 min using a motorized treadmill (5 min warm‐up, 25 min walking, 5 min rest). The walking speed was determined at 60% to 75% of the maximum HR, deducting participant's age from 220.	1. None 2. None	1. None 2. None
Calandre 2009	Length: 6 weeks 1. STRETCHING (in water) 1 h, 3/week 2. TAI CHI (in water) 1 h, 3/week	1. Training was done in a pool with water heated at 36 °C and was preceded by a shower with warm water (34.5 °C to 35.5 °C). In order to facilitate the stretching, participants were given 1‐meter‐long wooden sticks. Stretching was performed over the muscles of main body areas: cervical area, upper and lower extremities, and trunk. 2. None	1. None 2. None	1. None 2. None	1. None 2. Participants were taught the 16 movements which constitute the Tai Chi therapy without the assistance of additional material. Tai Chi is performed standing in shoulder‐depth water using a combination of deep breathing and slow, broad movements of the arms, legs, and torso.
Gavi 2014	Length: 16 weeks 1. FLEXIBILITY 45 min, 2/week 2. RESISTANCE TRAINING 45 min, 2/week	1. Stretching program included the major muscles groups. Valim 2003 is referenced for stretching program. 2. None	1. None 2. Resistance training group received supervised progressive training in the standing and sitting positions using weight machines. The intensity was moderate, with an overload of 45% of the estimated 1 RM, calculated based on maximal repetitions. 8 major groups were trained (quadriceps, femoris, hamstrings, biceps brachii, triceps brachii, pectoral, calf, deltoid, and latissimus dorsi) in 12 different exercises, with 3 sets of 12 repetitions (leg press, leg extension, hip flexion, pectoral fly, triceps extension, shoulder flexion, leg curl, calf, pulldown, shoulder abduction, biceps flexion, and shoulder extension).	1. None 2. None	1. None 2. None
Jones 2002	Length: 12 weeks 1. FLEXIBILITY 1 h, 2/week 2. STRENGTH 1 h, 2/week	1. The muscles included in the protocol were gastrocnemius, tibialis anterior, quadriceps, hamstrings, gluteus, abdominals, erector spinae, pectorals, latissimus dorsi, rhomboids, deltoids, biceps, triceps. Static stretch, participant controlled intensity of stretches. 10 min warm‐up, 40 min stretching, 10 min cool‐down of guided imagery and relaxation 2. Warm‐up and cool‐down	1 and 2 warm‐up	1. None 2. The muscles included in the protocol were gastrocnemius, tibialis anterior, quadriceps, hamstrings, gluteus, abdominals, erector spinae, pectorals, latissimus dorsi and rhomboids, deltoids, biceps, triceps. Equipment used: 1‐ to 3‐pound weights and/or surgical tubing. Concentric/eccentric contractions with minimized work during eccentric phase. Intensity and progression directed by participant. Single set throughout, repetitions progressed from 4 or 5 to 12. Participants encouraged to decrease activity during fibromyalgia flares. 1‐hour program including 5 min warm‐up, 45 min strengthening, 10 min cool‐down	1. None 2. None
López‐Rodríguez 2012	Length: 12 weeks 1. (CONTROL) FLEXIBILITY 1 h, 2/week 2. EXPERIMENTAL GROUP biodanza 1 h, 2/week	1. Flexibility stretching exercises that included global stretches and stretches specific to different muscular areas of the body 2. None	1. None 2. Biodanza in the water with water temperature approximately of 29 °C preceded by a shower at 33 °C to 35 °C, biodanza‐type movements like walking, slow movements of upper and lower extremities, cool‐down stretching. The duration of the intervention was 60 min (10 min warm‐up, 4 min biodanza, 10 min cool‐down).	1. None 2. None	1. None 2. None
Matsutani 2012	Length: 8 weeks 1. STRETCHING 45 min, 1/week 2. AEROBIC 30 min, daily	1. Static stretching exercises were performed in a segment of the muscle groups: triceps leg, gluteal, iliopsoas, hamstring, paraspinal, latissimus dorsi, diaphragm, adductor pubic associated with lumbar pelvic movements, trapezius, and major and minor pectoralis. All exercises emphasized breathing and postural alignment. Static stretches held 30 s, repeated 4 times with 30 s rest, progressed from lying to sitting to standing upright or in flexion. Breathing and postural alignment were emphasized. A mirror was used as an aid to the perception of movements of the upper limbs and postural alignment. 2. None	1. None 2. A treadmill walk was performed with intensity defined according to HR, between 60% and 70% HR for age (formula used, HR max = 220−age).	1. None 2. None	1. None 2. None
McCain 1988	Length: 20 weeks 1. FLEXIBILITY 1 h, 3/week 2. AEROBIC EXERCISE 1 h, 3/week	1. Exercises consisted of flexibility maneuvers such that sustained HR responses greater than 115 beats per min were not attained. 2. None	1. None 2. After a 10‐minute preliminary warm‐up exercise, individuals were subjected to sustained HR elevation training through the use of a bicycle ergometer. Heart rate was maintained in excess of 150 beats per minute for gradually increasing time periods.	1. None 2. None	1. None 2. None
Richards 2002	Length: 12 weeks 1. RELAXATION AND FLEXIBILITY 1 h, 2/week 2. AEROBIC EXERCISE 1 h, 2/week	1. Relaxation and flexibility comprised upper and lower limb stretches and relaxation techniques based on the published regimen by Ost 1987. As the classes proceeded, more techniques were introduced progressing through progressive muscle relaxation, release‐only relaxation and visualization, cue‐controlled relaxation, and differential relaxation. 2. None	1. None 2. Exercise therapy comprised an individualized aerobic exercise program, mostly walking on treadmills and cycling on exercise bicycles. Each individual was encouraged to steadily increase the amount of exercise as tolerated.	1. None 2. None	1. None 2. None
Valim 2003	Length: 20 weeks 1. STRETCHING EXERCISE GROUP 45 min, 3/week 2. AEROBIC EXERCISE GROUP 45 min, 3/week	1. 17 static exercises using both muscles and joints in a general way, including face, cervical , trunk, and extremities. Exercises chosen to provide flexibility without increasing HR. Each maximum position was sustained for 30 s. 2. None	1. None 2. Exercise group underwent a walking program monitored with frequency meters and supervised by a physiotherapist. The walking speed (training load) was determined by the training HR. Training HR defined as the load beat immediately preceding the one in which the anaerobic threshold occurred. Each training session was preceded by a warm‐up period in which participants were instructed to walk freely and slowly for 5 to 10 min. After each session the participants were placed in a circle and performed rhythmic movements, to promote cooling off, for 5 min.	1. None 2. None	1. None 2. None
HR: heart rate; RM: maximum repetition; Max: maximum

Table 3. Detailed description of exercise protocol

Table 4. Congruence with 2013 ACSM flexibility criteria for healthy adults

Author, year	Met ACSM 2013 criteria
	Frequency	Intensity	Time	Type	Volume	Pattern
	2 to 3 d/week with daily being most effective	Stretch to the point of feeling tightness or slight discomfort	10 s to 30 s	A series of flexibility exercises for each of the major muscle‐tendon units	60 s of total stretching time for each flexibility exercise	2 to 4 repetitions
Altan 2009	Yes	Unclear	No	Yes	Unclear	Unclear
Amanollahi 2013	Yes	Unclear	Yes	Yes	Yes	Yes
Assumpção 2017	Yes	Yes	Yes	Yes	Unclear	Unclear
Bressan 2008	No	Unclear	Yes	Yes	Yes	Yes
Calandre 2009	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Gavi 2014	Yes	Unclear	Yes	Yes	Unclear	Unclear
Jones 2002	Yes	Unclear	Yes	Yes	Unclear	Unclear
López‐Rodríguez 2012	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Matsutani 2012	No	Unclear	Yes	Yes	Yes	Yes
McCain 1988	Yes	Unclear	Unclear	Unclear	Unclear	Unclear
Richards 2002	Yes	Unclear	Unclear	Yes	Unclear	Unclear
Valim 2003	Yes	Unclear	Yes	Yes	Unclear	Unclear

Table 4. Congruence with 2013 ACSM flexibility criteria for healthy adults

Table 5. Quality of evidence—GRADE assessment: long‐term effects of flexibility exercise training versus aerobic exercise training

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Aerobic (end of intervention)	Certainty	Importance
HRQoL (follow‐up 36 weeks after end of intervention; assessed with FIQ Total 0 to 100, lower is best)
1 randomized trial	Serious^a	Not serious	Very serious^b	Serious^c	None	67	68	⨁◯◯◯ VERY LOW	CRITICAL
Pain intensity (follow‐up 36 weeks after end of intervention; assessed with VAS 0 to 100, lower is best)
1 randomized trial	Serious^a	Not serious	Very serious^b	Serious^c	None	67	69	⨁◯◯◯ VERY LOW	CRITICAL
Fatigue, stiffness, and physical function: not measured
Withdrawals, adverse events: not reported
FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life; VAS: visual analogue scale ^aDowngraded one level for selection bias. ^bDowngraded two levels because flexibility was used as a proxy (i.e. flexibility exercise was used along with relaxation as the control in the study). ^cDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb).

Table 5. Quality of evidence—GRADE assessment: long‐term effects of flexibility exercise training versus aerobic exercise training

Table 6. Quality of evidence—GRADE assessment: flexibility intervention versus control

№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	№ of participants		Certainty	Importance
						Flexibility	Control (end of intervention)
Pain, intensity, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	14	14	⨁⨁◯◯ LOW	CRITICAL
Physical function, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	14	14	⨁⨁◯◯ LOW	CRITICAL
Withdrawals
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	4/18 (22.2%)	2/16 (12.5%)	⨁⨁◯◯ LOW	IMPORTANT
HRQoL, fatigue, and stiffness: data were described as skewed, thus were not used
Adverse events: not measured/reported for either group
HRQoL: health‐related quality of life ^aDowngraded one level because of selection and performance bias. ^bDowngraded one level because of imprecision (sample size lower than 400 rule‐of‐thumb).

Table 6. Quality of evidence—GRADE assessment: flexibility intervention versus control

Table 7. Quality of evidence—GRADE assessment: flexibility intervention versus resistance training intervention

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Resistance (at end of intervention)	Certainty	Importance
HRQoL, FIQ Total, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	28	28	⨁⨁◯◯ LOW	CRITICAL
Pain, intensity, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Not serious	Not serious	Serious^b	None	73	79	⨁⨁◯◯ LOW	CRITICAL
Fatigue, 0 to 100, lower is best (end of intervention)
2 randomized trials	Very serious^c	Serious^d	Not serious	Serious^b	None	59	63	⨁◯◯◯ VERY LOW	IMPORTANT
Physical function, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Very serious^e	Not serious	Serious^b	None	45	51	⨁◯◯◯ VERY LOW	IMPORTANT
> 30% improvement of pain (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^b	None	5/14 (35.7%)	6/16 (37.5%)	⨁⨁◯◯ LOW	IMPORTANT
Withdrawals
3 randomized trials	Serious^a	Not serious	Not serious	Serious^b	None	19/77 (24.7%)	14/82 (17.1%)	⨁⨁◯◯ LOW	IMPORTANT
Stiffness: not measured
Adverse events: not measured/reported for flexibility training group For resistance training group, "one subject in the resistance group interrupted participation in the study because of worsening pain" (page 13 of 22)
FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life ^aDowngraded one level because of selection and performance bias. ^bDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb). ^cDowngraded two levels because of selection and performance bias. ^dDowngraded one level for inconsistency. ^eConsiderable heterogeneity (I² = 91%).

Table 7. Quality of evidence—GRADE assessment: flexibility intervention versus resistance training intervention

Table 8. Quality of evidence—GRADE assessment: flexibility intervention versus other comparators

Certainty assessment						№ of participants		Certainty	Importance
№ of studies and study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Flexibility	Other comparators (end of intervention)	Certainty	Importance
HRQoL, FIQ Total, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	83	86	⨁◯◯◯ VERY LOW	CRITICAL
Pain, intensity, 0 to 100, lower is best (end of intervention)
3 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	153	151	⨁◯◯◯ VERY LOW	CRITICAL
Fatigue, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	59	61	⨁◯◯◯ VERY LOW	IMPORTANT
Stiffness, 0 to 100, lower is best (end of intervention)
2 randomized trials	Serious^a	Serious^b	Not serious	Serious^c	Studies not pooled	59	61	⨁◯◯◯ VERY LOW	IMPORTANT
Physical function, 0 to 100, lower is best (end of intervention)
1 randomized trial	Serious^a	Not serious	Not serious	Serious^c	1 study	20	19	⨁⨁◯◯ LOW	IMPORTANT
Withdrawals
4 randomized trials	Serious^a	Serious	Not serious	Serious^c		27/188 (14.4%)	26/192 (13.5%)	⨁◯◯◯ VERY LOW	IMPORTANT
Adverse events: not reported for flexibility group In the medication arm, 5 participants who received ibuprofen and 1 participant who received nortriptyline experienced side effect (from translated version of article).
FIQ: Fibromyalgia Impact Questionnaire; HRQoL: health‐related quality of life ^aDowngraded one level because of selection and performance bias. ^bInterventions not consistent across studies. ^cDowngraded one level for imprecision (sample size lower than 400 rule‐of‐thumb).

Table 8. Quality of evidence—GRADE assessment: flexibility intervention versus other comparators

Comparison 1. Flexibility versus aerobic (end of intervention)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention) Show forest plot	2	193	Mean Difference (IV, Random, 95% CI)	4.14 [‐5.77, 14.05]

2 Pain, Intensity, 0‐100, lower is best (end of intervention) Show forest plot	4	131	Mean Difference (IV, Random, 95% CI)	2.78 [‐6.29, 11.85]

3 Fatigue, 0‐100, lower is best (end of intervention) Show forest plot	2	75	Mean Difference (IV, Random, 95% CI)	‐4.12 [‐13.31, 5.06]

4 Stiffness, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Subtotals only

5 Physical function, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

6 Depression, 0‐100, lower is best (end of intervention) Show forest plot	3	94	Mean Difference (IV, Random, 95% CI)	‐6.28 [‐19.28, 6.71]

7 Tenderness 0‐18, lower is best (end of intervention) Show forest plot	4	253	Std. Mean Difference (IV, Random, 95% CI)	0.20 [‐0.08, 0.48]

8 Withdrawals Show forest plot	5	301	Risk Ratio (M‐H, Random, 95% CI)	0.97 [0.61, 1.55]

9 Long‐term effects Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

9.1 HRQoL	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.2 Pain	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.3 Tenderness	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]

Comparison 1. Flexibility versus aerobic (end of intervention)

Comparison 2. Flexibility versus control (end of intervention)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 Pain, Intensity, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

2 Physical function, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

3 Withdrawals Show forest plot	1		Risk Ratio (M‐H, Random, 95% CI)	Totals not selected

Comparison 2. Flexibility versus control (end of intervention)

Comparison 3. Flexibility versus resistance (end of intervention)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

2 Pain, Intensity, 0‐100, lower is best (end of intervention) Show forest plot	3	152	Mean Difference (IV, Random, 95% CI)	1.84 [‐4.15, 7.83]

3 Fatigue, 0‐100, lower is best (end of intervention) Show forest plot	2	122	Mean Difference (IV, Random, 95% CI)	9.83 [‐5.30, 24.97]

4 Physical function, 0‐100, lower is best (end of intervention) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

5 Depression, 0‐63, lower is best (end of intervention) Show forest plot	2	122	Mean Difference (IV, Random, 95% CI)	0.47 [‐3.40, 4.35]

6 Tenderness, 0‐18, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

7 > 30% improvement of pain (end of intervention) Show forest plot	1	30	Odds Ratio (M‐H, Random, 95% CI)	0.93 [0.21, 4.11]

8 Withdrawals Show forest plot	3	159	Risk Ratio (M‐H, Random, 95% CI)	1.43 [0.77, 2.67]

Comparison 3. Flexibility versus resistance (end of intervention)

Comparison 4. Flexibility versus other comparators (end of intervention)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 HRQoL, FIQ Total, 0‐100, lower is best (end of intervention) Show forest plot	3		Mean Difference (IV, Random, 95% CI)	Totals not selected

1.1 Flexibility vs Pilates	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
1.2 Flexibility vs Tai Chi	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
1.3 Flexbility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
2 Pain, Intensity, 0‐100, lower is best (end of intervention) Show forest plot	3		Mean Difference (IV, Random, 95% CI)	Totals not selected

2.1 Flexibility vs Pilates	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
2.2 Flexibility vs Tai Chi	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
2.3 Flexibility vs friction massage	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
2.4 Flexibility vs medication	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
3 Fatigue, 0‐100, lower is best (end of intervention) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

3.1 Flexibility vs Tai Chi	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
3.2 Flexibility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
4 Stiffness, 0‐100, lower is best (end of intervention) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

4.1 Flexibility vs Tai Chi	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
4.2 Flexibility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
5 Physical function, 0‐100, lower is best (end of intervention) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Totals not selected

5.1 Flexibility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
6 Depression, 0‐63, lower is best (end of intervention) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

6.1 Flexibility vs Tai Chi	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
6.2 Flexibility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
7 Tenderness, 0‐18, lower is best (end of intervention) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

7.1 Flexibility vs Pilates	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
7.2 Flexibility vs aquatics	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
8 Withdrawals Show forest plot	4		Risk Ratio (M‐H, Random, 95% CI)	Totals not selected

9 Long‐term effects: flexibility vs other comparators Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Totals not selected

9.1 HRQoL	2		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.2 Pain	2		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.3 Fatigue	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.4 Stiffness	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.5 Depression	1		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]
9.6 Tenderness	2		Mean Difference (IV, Random, 95% CI)	0.0 [0.0, 0.0]

Comparison 4. Flexibility versus other comparators (end of intervention)