1 Introduction

Cost-effectiveness analysis has been a hot topic in health services and outcomes research for at least the last decade. The simplest case, where two treatments are being compared head-to-head, is Incremental Cost-Effectiveness (ICE) inference. The tutorial of Briggs and Fenn (1998) provides a good introduction to and review of the extremely wide variety of methodologies that have been proposed to quantify ICE uncertainty. A synopsis of ICE statistical inference is that it addresses a complex 2-sample, 2-variable problem. Specifically, ICE inference examines differences between two treatment groups using data on two types of outcomes (cost and effectiveness) that may be correlated.

As an aid to visualization in 2-dimensional, real Euclidean space, Black (1990) proposed that the incremental difference (new treatment minus standard treatment) in mean effectiveness, ΔE, be plotted horizontally while the corresponding difference in mean cost, ΔC, is plotted vertically. To depict uncertainty in a (ΔE, ΔC) point estimate (a bivariate statistic), its bootstrap distribution under resampling of patient outcome pairs with replacement within treatment groups is displayed as a scatter of points on the ICE plane; see Briggs and Fenn (1998, pp. 731–734) for a summary of ICE bootstrapping methodology.

The bivariate bootstrap distribution of uncertainty in (ΔE, ΔC) estimates yields a (univariate) confidence interval for the unknown true expected value of any scalar valued ICE summary statistic. Due to the inherently bivariate nature of the health and financial outcomes being compared, substantial challenges to the use and interpretation of the ICE ratio, ΔC/ΔE, have been noted (Chaudhary and Stearns 1996; Fan and Zhou 2007; Heitjan et al. 1999a, b; Laupacis et al. 1992). Although closely related to the ICE ratio, the Net Benefit (NB) approach (Laska et al. 1999; Stinnett 1999; Stinnett and Mullahy 1998) may have avoided similar criticism because it attempts to quantify overall incremental preference or utility. In NB, constant preference contours (indifference curves) are straight lines on the ICE plane with positive slope, λ. Within the North East (NE) quadrant, this slope can be interpreted as the willingness-to-pay (WTP) a higher cost in return for increased effectiveness; within the South West (SW) quadrant, this same slope is interpreted as the willingness-to-accept (WTA) a less effective treatment in return for lower cost. In other words, NB collects outcomes on the ICE plane into linear equivalence classes (straight lines of constant incremental preference) ordered by a scalar index, the NB.

Unfortunately, the assumption of uniform slope for iso-preference curves is unrealistic in the sense that linear utility has been consistently contradicted in empirical studies (O’Brien et al. 2002; O’Brien and Sculpher 2000; Willan et al. 2001). O’Brien et al. (2002) provide a good introduction to WTP and WTA concepts as well as a highly readable summary of relevant literature. Here we describe a logical foundation for ICE preference quantification (Obenchain 2000, 2001) including a family of two-parameter models that admit complex economic behaviors, such as non-equivalent WTP and WTA. Our axioms are quite general, and our arguments motivating them are both simple and intuitive.

The linear NB map does satisfy all four basic axioms. But our two-parameter family of simple “models” for preference variation across the entire ICE plane provides nonlinear generalizations of NB. For example, see Fig. 2a–c of Sect. 3.6 and the discussions of their interpretation in subsequent sections.

Section 2 introduces basic notation and demonstrates that all commonly considered transformations of the ICE plane, including both treatment re-labeling (new versus standard) and axis rescaling (changes in the shadow price of health), are simple linear transformations with a fixed point at the ICE origin. With λ held fixed, treatment differences in outcome are first standardized by expressing both (ΔE, ΔC) differences in identical units, either both in cost units or else both in effectiveness units. Section 3 then discusses our four basic axioms expressed in standardized units and introduces our 2-parameter family of “signed-power” maps. Section 4 discusses the returns-to-scale properties of these maps. Section 5 defines how either WTP or WTA is related to the slope of the iso-preference contour that passes through any given point on the ICE plane. Section 6 then discusses how “Bernie’s Kink” (O’Brien et al. 2002; Willan et al. 2001), WTP < WTA, is related to the “Gray Areas” in the seminal work of Laupacis et al. (1992) and quantifies the angular size of this kink. Section 7 discusses important distinctions between “ALICE” curves and traditional, linear measures of acceptability using a numerical example. Section 8 then shows that economic uncertainty about shadow price, λ, can totally swamp the statistical uncertainty about the true location of (ΔE, ΔC) within the ICE plane. Finally, Sect. 9 discusses the advantages and practical implications of nonlinear preferences as well as the need for greater consensus on ICE vocabulary and methodology.

2 Basic ICE notation and terminology

2.1 ICE outcomes

An ICE outcome is represented by a pair of expected treatment differences, usually expressed in Cartesian coordinates as (ΔE, ΔC). Here, ΔE is a difference in average treatment effectiveness of the form “new” treatment minus “standard” treatment. The underlying effectiveness measurement needs to be defined in such a way that larger (more positive) values of ΔE are unambiguously more favorable to the new treatment. The corresponding difference in average per-patient cost, ΔC, must be such that smaller (more negative) values are unambiguously more favorable to the new treatment.

2.2 ICE transformations

Transformations of ICE outcome coordinates occur quite naturally. For example, interchanging the labels (new and standard) on the two treatments being compared would multiply both ΔE and ΔC by minus one.

Next, let λ denote society’s fixed “shadow price” for one unit of effectiveness. In other words, λ is a strictly positive substitution rate expressed in units of cost per unit of effectiveness. For any specified value of λ, a cost difference of y = ΔC is re-expressed in effectiveness units by dividing it by λ. Alternatively, the corresponding effectiveness difference of x = ΔE would be expressed in cost units by multiplying it by λ. An arbitrary ICE outcome (ΔE, ΔC) thus gets transformed into either (x, y) = (λΔE, ΔC) in cost units or else into (x, y) = (ΔE, ΔC/λ) in effectiveness units. Either choice represents a standardized, canonical form for expected, overall treatment differences that clearly depends upon choice of the fixed numerical value of λ.

3 ICE preference maps

Let P(x, y) denote a real-valued function, called an “ICE preference map,” that determines, as explained below, not only which of two treatments is preferred but also the strength of that preference. Specifically, P(x, y) can be visualized as a surface defined over the entire 2-dimensional Euclidean plane, (x, y), of standardized ICE outcomes. Our three primary interpretation conventions (assumptions) for P(x, y) will be as follows:

P(x, y) = 0 means that the (x, y) pair of treatment differences correspond to no preference whatsoever, either for the new treatment over the standard treatment or vice-versa.

P(x, y) > 0 means that the treatment currently called new is preferred over the treatment currently called standard. Strictly positive P(x, y) values are at least ordinal measures of strength of preference for the new treatment over the standard treatment.

P(x, y) < 0 means that the treatment currently called standard is preferred over the treatment currently called new. The absolute values of negative P(x, y) values are at least ordinal measures of strength of preference for the standard treatment over the new treatment.

Table 1 lists our four axiomatic properties of ICE preference maps. When first examining this table, it may be helpful to note that the linear preference map, NB(x, y) = x − y (Stinnett and Mullahy 1998), clearly satisfies all four of these axioms.

Table 1 Four axioms of ICE preference

Note that the re-labeling, symmetry and anti-symmetry axioms represent additional restrictions on ICE preferences only when x ≠ y. After all, the P(x, x) = 0 property of the first axiom renders the implications of all other axioms moot for all outcomes with x = y.

The next 5 sub-sections discuss the meaning and interpretation of these four necessary axioms. The final sub-section then introduces our 2-parameter family of preference maps sufficient to satisfy the axioms and provide realistic, nonlinear generalizations of NB.

3.1 ICE indifference and the direction of preference

When x = y, society receives exactly the difference in effectiveness for which it pays, no more and no less. Therefore, there is no compelling reason to prefer one treatment relative to the other, i.e., P(x, y) = 0. If the new treatment is more effective, it is commensurably more costly. If the new treatment is less costly, it is commensurably less effective.

When x > y, the difference in effectiveness of the new treatment compared to the standard exceeds the cost difference between the treatments. Society thus receives a level of incremental effectiveness worth more than its incremental cost. Therefore, the new treatment is preferred over the standard, i.e., P(x, y) > 0.

When x < y, the difference in cost is larger than the difference in effectiveness. Society thus receives a level of incremental effectiveness worth less than its incremental cost. The standard treatment is then preferred over the new treatment, i.e., P(x, y) < 0.

This first axiom is by far the most restrictive of the four considered here. It dictates an infinite, linear interface of standardized slope 1 separating positive from negative ICE preferences. In original units, this is the line ΔC/ΔE = λ with slope determined by the shadow price of health, which ICE analysts may wish to deliberately vary to perform sensitivity analyses that turn out to be anything but subtle.

3.2 ICE monotonicity

Complete preference orderings of all outcomes of the ICE plane are subject to ongoing research and debate. However, a fundamental property of all sensible preference maps is that P(x, y) ≥ P(x0, y0) for all x ≥ x0 and all y ≤ y0. If the effectiveness of a new treatment is increased at the same time its ultimate cost is decreased, preference for that new product over a fixed standard treatment certainly cannot decrease. Remember that we are assuming that x has been defined so that larger (more positive) values of standardized effectiveness are more favorable to the treatment currently called new. Similarly, y must be defined so that smaller (more negative) values of standardized cost are more favorable to the treatment currently called new.

3.3 ICE re-labeling

One meaning of P(x, y) = −P(−x, −y) is that, when reversing treatment labels (new and standard) on a single pair of treatments, the direction of preference is reversed while the strength of preference is preserved. The implications of this axiom are broader in the sense that this same preference equality must also hold when a fixed new treatment is either preferred or not preferred to a fixed standard by a fixed, specified amount. This axiom imposes a form of fairness or even-handedness upon head-to-head ICE treatment comparisons.

3.4 ICE symmetry

Axiom 4 can be expressed in two equivalent ways. Starting with the third axiom plus either form of the fourth axiom, the other form of the fourth axiom follows immediately by simple algebra. For example, the re-labeling property, P(x, y) = −P(−x, −y), can be combined with the symmetry property, P(x, y) = P(−y, −x), to yield P(x, y) = −P(y, x), which is the anti-symmetry property.

The ICE symmetry axiom requires that preferences for any pair of outcomes symmetrically located relative to the upper-left to lower-right standardized diagonal, x = −y, of the ICE plane must be identical, P(x, y) = P(−y, −x). In other words, for any (x, y) outcome, the alternative outcome of (−y, −x) must yield the exact same strength of preference in the same direction (new over standard or vice versa). The “named” wedge-shaped segments of the ICE plane discussed in Laupacis et al. (1992) and illustrated here in Fig. 1 were apparently the first depictions ICE preference symmetry.

Fig. 1
figure 1

The wedge-shaped regions of the ICE Plane discussed in Laupacis et al. (1992) are symmetrically positioned relative to the x = −y diagonal

Suppose now that (xo, yo) is any fixed point within the NE quadrant, 0 < xo and 0 < yo. As a result, (−yo, −xo) is then a fixed point within the SW quadrant, as illustrated in Fig. 1. Denoting the standardized ICE ratio (slope) corresponding to (xo, yo) by so = yo/xo > 0, it follows that the standardized slope corresponding to the outcome pair (x, y) = (−yo, −xo) is s = y/x = −xo/−yo = +1/so > 0. In other words, one immediate implication of the ICE symmetry axiom is that all (xo, yo) and (−yo, −xo) pairings with equivalent preferences have standardized ICE ratios, s = y/x, that are numerical reciprocals or inverses.

This inverse relationship has considerable intuitive appeal. Within the NE quadrant, s = y/x is a positive “loss over gain” ratio; the positive numerator represents an undesirable additional cost (loss) while the positive denominator represents a desirable increase in effectiveness (gain). Meanwhile, within the SW quadrant, s = y/x is a positive “gain over loss” ratio; the negative numerator represents a desirable cost reduction (gain) while the negative denominator represents an undesirable reduction in effectiveness (loss).

In other words, numerically small and positive standardized ICE ratios are desirable within the NE quadrant where they represent loss/gain ratios, while numerically large and positive standardized ICE ratios are desirable within the SW quadrant (Heitjan et al. 1999a, b) where they represent gain/loss ratios. In Fig. 1, these two regions are represented by the wedges labeled “Favorable (B)” that are colored yellow-green. By assuring that outcomes within the NE and SW quadrants that yield equivalent preferences also yield standardized ICE ratios that are numerical reciprocals (yo/xo and −xo/−y= xo/yo), the ICE symmetry axiom simply formalizes basic intuition.

Within the South East (SE) quadrant [dark green, labeled “Highly Favorable (A)” in Fig. 1], s = y/x is a negative “gain over gain” ratio; the negative numerator represents a desirable cost reduction while the positive denominator represents a desirable increase in effectiveness for the new treatment. All ICE ratios for outcome differences within the SE quadrant thus represent a distinct preference for the new treatment over the standard.

Finally, within the North West (NW) quadrant [dark red, labeled “Highly Unfavorable (E)” in Fig. 1], s = y/x is a negative “loss over loss” ratio; the positive numerator represents an undesirable added cost while the negative denominator represents an undesirable reduction in effectiveness for the new treatment. All NW quadrant outcome differences thus represent a distinct preference for the standard treatment over the new treatment.

Next, note that the linear preference map, NB(x, y) = x – y, possesses a purely optional property that is much stronger and more restrictive than P(x, y) = P(−y, −x). This linear NB preference is constant everywhere on the straight line passing through the points (x, y) and (−y, −x). Again, when one’s preference map is linear, preference is assumed constant on all straight lines (x – y = constant) that are parallel to the lower-left to upper-right diagonal (x = y) of the ICE plane.

Finally, note that ICE preference symmetry property, P(x, y) = P(−y, −x), does impose an additional restriction besides reciprocal ICE ratios, y/x and −x/−y = x/y, for the corresponding outcomes pairings. Namely, all such outcome pairs clearly also have the same ICE radius, \( {\text{r}} = {\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }. \)

3.5 ICE anti-symmetry

As previously noted, the ICE anti-symmetry axiom can be viewed as following directly from the re-labeling and symmetry restrictions. In its own right, the anti-symmetry requirement that P(x, y) = −P(y, x) is quite intuitive. It requires symmetry in strength of preferences about the x = y diagonal. However, this property is called anti-symmetry here because the direction of preferences is reversed on the two different sides of the x = y diagonal. After all, when pairs of outcomes of the form (x, y) and (y, x) are not on the x = y diagonal, they are symmetrically located relative to this x = y diagonal.

3.6 Two-parameter ICE preference maps

To generalize the linear preference map NB(x, y) = x − y, let us now consider the family of ICE preference maps of the form

$$ {\text{P}}({\text{x,y}}) \propto ({\text{x}}^{2} + {\text{y}}^{2} )^{{(\beta - \gamma )/2}} {\left\{ {{\text{x}} - {\text{y}}} \right\}}^{\gamma } , $$
(1)

where ∝ means “is proportional to,” β and γ are strictly positive “power” parameters, and the special notation {z}γ denotes a “signed-power.” Specifically, {z}γ denotes the product of sign(z) [which is +1, 0 or −1] times the absolute value of z raised to the power γ. Special care needs to been taken in Eq. 1 because non-integer powers of negative real numbers are generally imaginary; ICE preferences need to be expressed as real numbers, with possibly only ordinal measures of strength.

It is straight-forward to verify that all ICE maps of form (1) satisfy axioms 1, 3 and 4 of Table 1. In the Appendix, we show that the following range restriction on the ratio of the β and γ power parameters,

$$ 1/\Upomega \le \gamma /\upbeta \le \Upomega \,{\text{for}}\,\Upomega {\text{ = (1 + }}{\sqrt {\text{2}} }{\text{)}}^{{\text{2}}} \approx 5.828, $$
(2)

is necessary and sufficient for maps defined by Eq. 1 to also satisfy axiom 2, ICE monotonicity. The class of two-parameter ICE preference maps satisfying Eqs. 1 and 2 is our proposed “signed-power” family.

Figure 2a–d depict ICE preference maps using equally spaced indifference curves (level curves, iso-preference contours) drawn using the contourplot() function within the “lattice” graphics package for R (The R Project for Statistical Computing 2007). Rather round nonlinear maps like the one in Fig. 2a result when γ < β; the linear map, NB(x, y) = x – y, of Fig. 2b results when γ = β = 1; and highly directional nonlinear maps like that of Fig. 2c–d result when γ > β. Note that the γ/β ratio of 0.25 for the map of Fig. 2a is well above the 1/Ω ≈ 0.1716 lower limit allowed under restriction (2), while γ/β = 4 of Fig. 2c is well below the Ω ≈ 5.828 upper limit for maps possessing ICE monotonicity.

Fig. 2
figure 2

(a) Level curves of the ICE preference map with β = 1.0 and γ = 0.25. (b) Linear level curves used in net benefit analysis (β = γ = 1). (c) Level curves of the ICE preference map with β = 0.25 and γ = 1. (d) Level curves of the ICE preference map with β = 0.5 and γ = 2.914 (η = 5.828)

4 Returns-to-scale

Suppose now that the observed treatment differences in cost, y, and effectiveness, x, are somehow both multiplied by a strictly positive and finite real valued factor f. In other words, the observed effectiveness difference of x becomes f times x, while the observed cost difference of y becomes f times y. The resulting new value of preference in Eq. 1 is then

$$ P(f{\text{x}},\,f{\text{y}}) \propto f^{{(\upbeta - \gamma ) + \gamma }} {\left[ {{\text{x}}^{2} + {\text{y}}^{2} } \right]}^{{(\upbeta - \gamma )/2}} {\left\{ {{\text{x}} - {\text{y}}} \right\}}^{\gamma } \propto f^{\upbeta} {\text{P(x,y)}}.$$
(3)

In other words, for every map in our 2-parameter family, returns-to-scale depend solely upon the β power parameter associated with only the ICE radius factor. Specifically, returns-to-scale will be:

$$ \begin{array} {ll} {\text{decreasing}} & {\text{if}} \,\, 0 < \beta < 1, \\ {\text{constant}}\,({\text{linear}}) & {\text{if}}\,\, \beta = 1,\ {\text{and}} \\ {\text{increasing}} & {\text{if}}\,\, 1 < \beta < +\infty . \\ \end{array} $$

5 Willingness-to-pay or accept

WTP or WTA at any point on the ICE plane is assumed here to be determined by the iso-preference contour that passes through that given point. In fact, we define a standardized “willingness” rate (of WTP/λ within the NE quadrant or WTA/λ within the SW quadrant) as being equal to the dy/dx slope of the tangent to the iso-preference contour at the point of interest. This is fully consistent with NB analysis in which iso-preference contours are straight lines of slope WTP = WTA = λ. For example, the standardized value for all three of these quantities is +1 at all points in Fig. 2b.

As shown in the Appendix, the standardized willingness rate at (x, y) for all signed-power ICE preference maps of form (1) is

$$ \begin{aligned}{} & {\text{w}}({\text{x}},{\text{y}}) = [\beta {\text{x}}^{2} + (\gamma - \beta ){\text{xy}} + \gamma {\text{y}}^{2} ]/[\gamma {\text{x}}^{2} + (\gamma - \beta ){\text{xy}} + \beta {\text{y}}^{2} ] \\ & = [1 + (\eta - 1){\text{s}} + \eta {\text{s}}^{2} ]/[\eta + (\eta - 1){\text{s}} + {\text{s}}^{2} ], \\ \end{aligned} $$
(4)

where s = y/x is the standardized ICE ratio and η = γ/β is the map “power-parameter ratio.” Remembering that (x, y) denotes either (λΔE, ΔC) in cost units or (ΔE, ΔC/λ) in effectiveness units, it follows that w(x, y) represents either

[a] a non-negative value of WTP/λ when ΔE, x, ΔC and y are all positive

or else

[b] a non-negative value of WTA/λ when ΔE, x, ΔC and y are all negative.

Since β and γ are unitless parameters and x and y are both measured here in the same units, it follows that η = β/γ, s = y/x and w(x, y) are all unitless quantities.

Note that the willingness rate evaluated at any point (x, y) is generally different from the standardized ICE ratio, s = y/x, corresponding to that point. On the other hand, when η = 1, w(x, y) ≡ 1 is a fixed value for all points (x, y) and all directions s = y/x.

For any fixed value of η different from 1, the standardized willingness rate (4) varies only with s. In other words, standardized willingness is then constant everywhere along every straight-line trajectory, s, passing through the origin of the ICE plane …except at the ICE origin itself. After all, neither w(0, 0) nor s = y/x are well defined at the ICE origin! Unlike ICE preferences, P(x, y), standardized willingness clearly does not vary with ICE radius, \( {\text{r}} = {\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }, \) within the 2-parameter maps of Eq. 1.

It can be shown that the maximum and minimum values for w(x, y) of (4) when η ≠ 1 are [2 + κ (η 1/2η −1/2)]/[2 − κ (η 1/2η −1/2)] for κ = ±1; both limits are non-negative when 1/Ω ≤ η ≤ Ω, which is also the restriction (2) that assures ICE monotonicity. The corresponding standardized directions are s = (1 + κη 1/2)/(1 − κη 1/2), which are both positive when η < 1 as in Fig. 2a and both negative when η > 1 as in Fig. 2c–d. The maximum and minimum values for w(x, y) are +∞ and 0 only in the limiting case η = Ω ≈ 5.828, and the pair of directions pointing to these limits then have (reciprocal) negative slopes of \( {\text{s}} = 1 - {\sqrt 2 } \) and \( 1 - {\sqrt 2 } \) (i.e., ICE Angles of θ = ±22.5° in Figs. 2d and 3).

Fig. 3
figure 3

A pair of standardized “dual rays” that (i) contain the same distributions of ICE preferences, (ii) correspond to equal absolute ICE polar angles relative to the x = −y diagonal, and (iii) have both standardized slopes (s = y/x) and standardized willingnesses (w = WTP/λ or WTA/λ) that are numerical reciprocals. The case depicted here corresponds to 0 < s < w < 1 < 1/w < 1/s because η = γ/β > 1 as in Fig. 2c

The (un-standardized) ICE willingness rate takes on 3 different, simple forms in the following special cases when η ≠ 1:

$$ \begin{aligned}{} {\text{W}}{\left( {\Updelta {\text{E}},\Updelta {\text{C}}} \right)} & = \lambda \quad {\text{when s}} = \pm 1{\left( {{\text{x}} = {\text{y}} \ne {\text{0 or x}} = - {\text{y}} \ne {\text{0}}} \right)}, \\ & = \lambda /\eta \quad {\text{when s}} = 0{\left( {{\text{y}} = 0\,{\text{and x}} \ne {\text{0}}} \right)}, \\ & = \lambda \eta \quad {\text{as s approaches }} \pm \infty {\left( {{\text{x}} = 0\,{\text{and y}} \ne {\text{0}}{\text{.}}} \right)} \\ \end{aligned} $$

6 Symmetry, dual ICE rays and (WTP, WTA) pairings

The ICE symmetry axiom (Obenchain 2000, 2001) dictates a number of key relationships. Again, the major implication of P(x, y) = P(−y, −x) is perhaps that these symmetric outcomes have standardized slopes, s = y/x, that are numerical reciprocals. But Eq. 4 then implies that w(x, y) and w(−y, −x) are also numerical reciprocals for all ICE maps of Eq. 1. The “dual rays” of Fig. 3 then consist of all standardized ICE outcome pairs of the form (fx, fy) and (−fy, −fx) with (x, y) fixed and ray slopes of s = y/x and 1/s, respectively, as f increases from 0 to +∞. Any such pair of dual rays also contains not only the same distribution of preference strengths (as a function of ICE radius) but also the same direction of preference (either always new over standard or vice versa).

In other words, the un-standardized ΔC/ΔE ratios corresponding to any such pair of dual rays are λs and λ/s, respectively, which are not reciprocals unless λ = 1. Similarly, the un-standardized willingness statistics of WTP = λw or WTA = λ/w are the slopes of the iso-preference contours at all points where they cross one of these two rays. These un-standardized willingness slope pairings are also not reciprocals unless λ = 1. Note, furthermore, that these relationships always hold, for all choices of returns-to-scale, β, in (3) and all power parameter ratios, η = β/γ. Finally, choice of λ determines the orientation of the basic pattern of preference variation across the ICE plane and is clearly at least as important as choice of either β or η.

The single, most important implication of the ICE symmetry axiom appears to be that well matched pairs of WTP and WTA values always have the property that WTP times WTA equals λ2. In other words, for our signed-power ICE preference maps (or any differentiable ICE preference map satisfying the symmetry axiom), the following relationship will always hold

$$ \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }. $$
(5)

Equation 5 states that the shadow price of health is the geometric mean of all well-matched pairs of strictly positive WTP and WTA values. In other words, Eq. 5 shows that WTP and WTA can both vary simultaneously within a fixed, nonlinear ICE preference map corresponding to a single fixed value of λ. Specifically, relative to choice of shadow price, λ, choices for the values of β and η (or γ) parameters are clearly less important.

6.1 An additional “realism” restriction is still needed

Figure 3 portrays an accurate visualization of the numerical ordering between w, s and their reciprocals (i.e., 0 < s < w < 1 < 1/w < 1/s) only in cases where the η = γ/β ratio is > 1 in Eq. 4, as in the “highly directional” nonlinear map depicted in Fig. 2c. Unfortunately, η < 1 implies that 0 < s < 1/w < 1 < w < 1/s, which yields WTA < λ < WTP below the x = y diagonal in the rather “round” maps (η < 1) like Fig. 2a. This alternative ordering has apparently never been observed in empirical research on WTP and WTA (O’Brien et al. 2002; Willan et al. 2001).

In summary then, only the nonlinear ICE preference maps of form (1) with power parameter ratio, η = γ/β, confined to the finite interval of \( 1 < \eta \le \Upomega = 3 + 2{\sqrt 2 } \) can be fully realistic. And only the limiting maps with η = Ω allow willingness (standardized or un-standardized) to vary all of the way from 0 to +∞.

6.2 The Laupacis visualization of ICE preferences corresponds to the β = 0 limit

Figure 1 depicts the limit of our signed-power family of ICE preference maps of Eq. 1 as the returns-to-scale parameter, β, approaches zero (while the γ parameter is held fixed at any finite value). In other words, η then approaches +∞ and the standardized willingness of Eq. 4 becomes w = (s + s2)/(1 + s) = s in this limit. While failing to possess ICE monotonicity and allowing negative values for w in Eq. 4 within the SE and NW quadrants, these limiting maps still have iso-preference curves that correspond to pairs of dual rays with reciprocal slopes. They also order preferences on ICE polar angle in the exact same way that they are ordered on all of our β > 0 maps for outcomes at the same ICE radius.

On the other hand, these limiting (β = 0, γ > 0) maps are not very realistic precisely because they yield zero returns-to-scale. In other words, they ignore ICE radius as a potential, partial determinant of preference …especially within the SE and NW quadrants.

6.3 Symmetry and literature on (WTP, WTA) pairings

Preliminary standardization, in which (x, y) represents either (λΔE, ΔC) in cost units or (ΔE, ΔC/λ) in effectiveness units for a fixed numerical value of λ, is essential to be able to express not only standardized directions, s = y/x, but also standardized willingnesses, w, as unitless quantities. In turn, unitless quantities become absolutely essential when reciprocals are to be compared. After all, if the quantities being compared were not unitless, the statistic and its reciprocal would be expressed in different units, such as $/QALY and QALY/$, and clearly could not be meaningfully compared!

The unitless ratio, WTA/WTP > 1, has been proposed (O’Brien et al. 2002; Willan et al. 2001) as the primary measure of the size of the observed “kink” in consumer’s thresholds. Using relationship (5), functions of this WTA/WTP ratio can be given simple geometric interpretations consistent with the Laupicas et al. (1992) visualization of preferences. Specifically, the standardized slope (w = s) of any WTP ray in Fig. 1 (β = 0, η = +∞) is \( {\text{s}} = {\sqrt {{\text{WTP}}/{\text{WTA}}} } < 1, \) while the standardized slope of the corresponding WTA ray is \( {\text{1/s}} = {\sqrt {{\text{WTA}}/{\text{WTP}}} } > 1. \) Similarly, the angular size of the “Favorable (B)” wedges and “Unfavorable (D)” wedges are all equal, again as depicted in Fig. 1. The size of these wedges in degrees is given by multiplying ArcTan(s), measured in radians, by 180 degrees and then dividing by π. Table 2 lists these measures for 10 of the 20 programs or products reported in Table 1 of O’Brien et al. (O’Brien et al. 2002).

Table 2 WTP-WTA statistics for empirical studies from Table 1 of O’Brien et al. (2002)

Note in Table 2 that the “Favorable (B)” and “Unfavorable (D)” wedges within the NE and SW quadrants are most narrow in the 7 environmental studies and most wide, approaching the maximum possible size of 45°, in some of the 4 safety and 7 experimental studies. For example, when this angle is only 6°, as in the “Trees in park” study, the corresponding Laupacis et al. (1992) “Gray Area (C)” wedges each measure 78°. At the other extreme of 43° for the “Favorable (B)” and “Unfavorable (D)” wedges in the “Job safety (VSL)” study, the “Gray Area (C)” wedges measure only 4° each.

Note that Eq. 5 does not actually establish numerical values for WTP, WTA, η, w or s but only a relationship between WTP, WTA and the shadow price of health, λ. In other words, WTP = WTA = λ is always one possibility. This is the only possibility in the purely linear NB formulation (β = γ = η = 1). In general, the cases where WTP = WTA = λ and w ≡ 1 because η = 1, including the linear map shown in Fig. 2B, are the only visualizations in which society is willing to pay the full shadow price of health.

Another interesting and realistic possibility is WTP < λ < WTA as in Figs. 1 and 2c–d as well as in Table 2. These are the cases where a bargain-seeking society or an individual may possibly be willing to pay only somewhat less than an established “fair” full price, possibly forcing providers to accept lower per-item profits while possibly also seeking higher volumes.

7 Nonlinear acceptability

The “acceptability curve” graph was originally proposed by Van Hout, Al, Gordon and Rutten (VAGR) (Van Hout et al. 1994) in 1994 to portray ICE uncertainty. Given either (i) a parametric, bivariate distribution (normal, say) with mean (ΔE, ΔC) that has been fitted to some observed patient-level data or else (ii) a bootstrap resampling distribution of ICE uncertainty, the VAGR curve depicts the estimated “confidence level” associated with the region to the right or below a rotating straight line through the ICE origin that starts out horizontal (representing WTP = 0) and rotates counter-clockwise by 90°, ending up being vertical (representing WTP = +∞). In 2004, Fenwick, O’Brien and Briggs (FOB) (Fenwick et al. 2004) cataloged as many as 13 “special cases” yielding VAGR curves with quite different shapes, ranging from rather flat, to increasing, to decreasing, to distinctly non-monotone.

An acceptability curve that is always monotone non-decreasing results from the unpublished alternative definition of acceptability independently proposed by me in 2001 and by Professor Ken Buckingham of Otago University, New Zeland, in 2003. My freely distributed software (Obenchain 2005, 2007) uses Buckingham’s terminology for Acceptability Levels In Cost Effectiveness (ALICE) curves. For any given and fixed positive value of λ, the ALICE frontier is defined using a pair of “kinked” dual ICE rays (i.e., rays that remain symmetric relative to the x = −y diagonal while rotating so that their absolute ICE polar angle, \(|\theta| \), is constantly increasing). Table 3 compares the VAGR and ALICE definitions within all four quadrants of the ICE plane.

Table 3 VAGR and ALICE definitions of acceptability

In the notation of Table 3, the standardized ICE slope, s, is a unitless quantity that increases from 0 towards +∞, while λ denotes a given, fixed value for the shadow price of health. Within the VAGR column of definitions, the product of (λ times s) thus denotes a variable quantity corresponding to different common values for shadow price = WTP = WTA defining different linear VAGR thresholds for acceptability. Within the ALICE column of definitions, WTP = λs increases with s within the NE quadrant while WTA = λ/s simultaneously decreases with s within the SW quadrant, defining a range of kinked ALICE thresholds for acceptability, clearly satisfying Eq. 5. This ALICE definition of acceptability agrees with the sum of double integrals in (Willan et al. 2001), page 3255. Technically, interest could even be restricted to the finite range 0 ≤ s ≤ 1 for ALICE curves because s > 1 corresponds to WTA < λ < WTP, again an ordering that has apparently never been observed empirically.

Note in Table 3 that the VAGR and ALICE definitions of acceptability differ only within the SW quadrant. VAGR and ALICE curves thus contain the same basic information (displayed using different horizontal axes) whenever the ICE uncertainty distribution attributes zero credibility to the SW quadrant. At the other extreme, where 100% credibility is attributed to the SW quadrant, the VAGR and ALICE curves are again equivalent, but the VAGR curve would then be decreasing while the ALICE curve is increasing, as usual. In other cases, we will see that VAGR curves are non-monotone and biased.

Cases where the ICE uncertainty distribution attributes credibility not only to the SE quadrant but also to the most desirable parts of both the SW and NE quadrants are particularly important. We will now consider a numerical example of this “high uncertainty” type.

7.1 Comparison of VAGR and ALICE curves for a high uncertainty example

Obenchain et al. (2005) use an example where missing-data-imputation and sensitivity-analyses are needed to make meaningful cost-effectiveness comparisons. The data are from a registration trial (Goldstein et al. 2004) that compares duloxetine, a serotonin and norepinephrine reuptake inhibitor (SNRI), with paroxetine, a standard selective serotonin reuptake inhibitor (SSRI), for treatment of major depressive disorder (MDD). To keep things as simple as possible, we will describe here only head-to-head comparisons between the 91 patients randomized to duloxetine 80 mg/d (40 mg BID) and the 87 patients randomized to paroxetine 20 mg/d. The corresponding bootstrap ICE uncertainty distribution illustrated in (Obenchain et al. 2005), Fig. 2b, is displayed here in Fig. 4 using the “standardized” form explained in the next 3 paragraphs.

Fig. 4
figure 4

First 1,000 out of 25,000 replications quantifying the bootstrap distribution of ICE uncertainty for the “DulxParx.ICE” numerical example distributed with my “ICEplane” Windows software (random number generator seed = 12345). Both axes are standardized here to units of $/Week using an assumed value for the shadow price of health of λ = 0.26. ICE Quadrant Confidence Levels (Obenchain 1999, 2005) are also displayed

Patient self-reported health-care-utilization above and beyond that provided within study protocol was collected using the Resource Utilization Survey (Copley-Merriman et al. 1992) with published 1998 $/unit costs (Schoenbaum et al. 2001) rounded to the nearest multiple of $50.00. $/Week was then calculated by multiplying (total accumulated cost) for a patient by 7 and dividing by the (total days of cost accumulation) for that patient. For patients who discontinued from the study, this is Average-Value-Carried-Forward imputation.

Measures of effectiveness in this study were derived from blinded, clinical assessments on the Hamilton Depression Rating Scale (Hamilton 1967). With missing values imputed via MMRM models (Goldstein et al. 2004), the measure of overall effectiveness described here will be “integrated” decrease in HAMD-17 score from baseline to endpoint, which is a (signed) area-under-the-curve measure. (with larger being more favorable to the new treatment over standard.)

This example is anything but “ideal” for illustrating a situation where one particular value for the shadow price of health, λ, has been well established. Specifically, ICE ratios are expressed here in units of “$ per Week per unit of Integrated-Decrease-from-Baseline in HAMD-17 score over an 8 week Acute Treatment period for MDD,” hereafter denoted by $/Week/IDBAT. This scale certainly has no known relationship with any “standard” measure such as $/QALY. Instead, we have simply assumed here that λ = $0.26/Week/IDBAT and quantified both axes in Fig. 4 in “cost units” of $/Week. As we will see later (Figs. 5 and 6), there is a weak sense in which this particular choice of λ is most favorable to the “new” treatment.

Fig. 5 
figure 5

The VAGR Acceptability curve (Fenwick et al. 2004; Van Hout et al 1994) for the numerical example of Fig. 4 is non-monotone. This curve starts high (at 82.48%) because the two “less costly” ICE quadrants have confidence levels of 64.41% (SE) and 18.07% (SW) in Fig. 4. The VAGR curve reaches its maximum of 88.70% for WTP = 0.26, then ultimately drops to 78.01% in the limit as the ICE Ratio approaches +∞ (not displayed) because the confidence level of the NE quadrant (more effective BUT more costly) is only 13.60% in Fig. 4

Fig. 6
figure 6

As is always the case, the ALICE curve for the numerical example of Fig. 4 is monotone nondecreasing. It starts at the SE quadrant confidence level of 64.41% because this (less costly AND more effective) region is below and to the right of the “kinked” boundary formed by WTP = 0 and WTA = +∞. The ALICE curve then increases to 88.70% as WTP increases and WTA decreases until WTP = WTA = λ = 0.26, which corresponds to the only ALICE frontier that is linear (i.e., formed by the pair of rays with ICE Angles of θ = ±90° as in Fig. 3) rather than “kinked.” The ALICE curve ultimately increases to 96.08%, which is the total confidence of the three (less costly OR more effective) ICE Quadrants lying below or to the right of the “kinked” boundary with WTP = +∞ (upward vertical ray, θ = +135°, in Fig. 3) and WTA = 0 (horizontal ray to the left, θ = −135°, in Fig. 3)

In any case, Fig. 4 shows that our example illustrates a common situation. Relative to the standard treatment (paroxetine), the new treatment (duloxetine) here could represent what is known (somewhat derisively) as a “me too” treatment for MDD. Specifically, the bootstrap distribution of uncertainty here completely covers the ICE origin and lends considerable credibility to at least 3 of the 4 ICE quadrants, at least when the $/Week difference in medication acquisition cost is zero, as assumed here. Our objectives in exploring this particular case-study example are two-fold. We intend to convince you not only (i) that the new treatment is at least somewhat cost-effective relative to the standard treatment in cases like that depicted in Fig. 4 but also that (ii) traditional VAGR acceptability curves are biased towards their average value in these critical “high uncertainty” cases. In particular, the all-important lower values of VAGR acceptability are biased upwards.

Figure 5 displays the non-monotone VAGR acceptability curve for our high uncertainty example that corresponds to a relatively wide range (from 0 to 5) for the unknown common value of WTP = λs = WTA. Only one numerical value within this wide range of alternative values for λs can correspond to the “true” shadow price of health.

It is not clear how VAGR or FOB themselves would interpret the information provided by Fig. 5. Because acceptability is always rather high (>0.80) over the finite range displayed here and the curve is also rather flat (max – min < 0.09), outcomes researchers might conclude that (i) choice of λ is relatively unimportant here and/or that (ii) the odds that the new treatment is more cost-effective than standard are at least 4:1 because 0.80/0.20 = 4.

Figure 6 displays the corresponding, monotone ALICE curve for this high uncertainty example. By covering the finite range for absolute ICE angles of 45° ≤ \(|\theta| \) ≤ 135°, the full infinite range of 0 ≤ s ≤ +∞ is easily visualized in Fig. 6. Furthermore, the values of the ICE Ratio = WTP displayed in Fig. 5 are now seen to be equally spaced along the horizontal axis of Fig. 6. Finally, Fig. 6 assumes that λ = 0.26 $/Week/IDBAT is the fixed, most relevant value for the true shadow price of health and also allows an overall acceptability level to be determined for all possible budget constraints of the form WTP = λs with s < 1 plus, by symmetry, WTA = λ/s.

Different choices for λ would yield different ALICE curves. However, all such alternative ALICE curves for a given set of data would have the same starting and ending points at \(|\theta| \) = 45° (s = 0) and \(|\theta| \) = 135° (s = +∞). Namely, the smallest ALICE value (0.6441 here) will always be the estimated confidence that the new treatment is both “less costly AND more effective” than standard, while the largest ALICE value (0.9608 here) will always be the estimated confidence that the new treatment is either “less costly OR more effective” than standard. These limits correspond to the two key ICE quadrant confidence levels needed to quantify statistical dominance (Obenchain et al. 2005).

Note that WTP = 0.26 $/Week/IDBAT yields the maximum VAGR acceptability (of 0.8870) in Fig. 5. Thus, it follows that no ALICE curve for any alternative value of λ (different from 0.26 $/Week/IDBAT) can yield a larger acceptability level than the value (of 0.8870) displayed in Fig. 6 at \(|\theta| \) = 90° (s = 1). We are definitely not recommending that the numerical value of λ used to define ALICE levels be routinely chosen in this way. After all, this particular choice of λ is again, in a weak sense, most favorable to the new treatment!

Rather, our point here is simply that VARG acceptability curves, by using only alternative linear frontiers (WTP = λ = WTA), are badly biased in all high uncertainty cases where the VARG curve ends up being flat or non-monotone. ALICE curves are then much less biased (upwards or downwards) because they use realistic kinked frontiers. Even the ALICE curve that is biased upwards as much as possible, as in Fig. 6, still suggests that administrative budget constraints (that reduce WTP and, when fair, also increase WTA) can drastically decrease the overall acceptability level of new over standard. This reduction is from 0.8870 at \(|\theta| \) = 90° (s = 1) to 0.6441 at \(|\theta| \) = 45° (s = 0) in Fig. 6, which is a reduction in confidence of 0.243.

After all, for two exactly equivalent treatments, the VAGR acceptability is expected to always be 0.50 for all values of WTP. The corresponding ALICE level would then also be expected to equal 0.50 at s = 1 (\(|\theta| \) = 90°), but it would be expected to drop to 0.25 at s = 0 (\(|\theta| \) = 45°) as well as to rise to 0.75 at s = +∞ (\(|\theta| \) = 135°), at least when cost and effectiveness differences are uncorrelated.

In all cases where the ICE bootstrap uncertainty distribution lends credibility to only one quadrant or to at most two adjacent quadrants, the information contained in VAGR and ALICE curves will really be equivalent (even when the VAGR curve is decreasing due to increases in WTA). In these relatively simple (lower uncertainty) cases, VAGR acceptability is not really biased relative to the corresponding ALICE level.

7.2 Advantages and disadvantages of VAGR and ALICE curves

ALICE curves always concentrate attention upon only the uncertainty within the available data supporting an ICE policy decision rather than upon any uncertainty about λ itself. Whenever a VAGR curve is non-montone, it is actually also depicting additional uncertainty about λ. When a VAGR curve is monotone, Table 3 shows that it can be reinterpreted as corresponding to a fixed value of λ. For example, when a VAGR curve is non-decreasing, it can be reinterpreted as displaying the uncertainty associated with values of WTP less than any value of ICE Ratio = λ within the plotting range. When a VAGR curve is non-increasing, it can be reinterpreted as displaying the uncertainty associated with values of WTA larger than any value of ICE Ratio = λ within the plotting range.

On the other hand, it is difficult to appreciate how a VAGR acceptability curve or an ALICE curve could be viewed as being a better graphical summary of ICE uncertainty than the bootstrap scatter itself! Given a scatter of bootstrap ICE uncertainty outcomes, (ΔE, ΔC), it will sometimes be fairly easy to visualize the corresponding VAGR or ALICE curve(s). The inverse problem of visualizing a bootstrap uncertainty scatter from its VAGR or ALICE curve is much more difficult. Specifically, all information about ICE radius (and thus returns-to-scale) has been discarded in the VARG and ALICE formulations.

8 Statistical uncertainty and economic preference variation

To this point, we have concentrated upon the more “desirable” features of ICE preference maps. It’s important to note that these positive aspects of preference quantification hold within a context where λ can be held fixed, as in Eq. 5, while both WTP and WTA are allowed to vary. In contexts where λ itself is deliberately varied, the strong implications of our first ICE preference axiom lead immediately to disturbing contradictions about economic preferences that totally dominate or “swamp” any statistical uncertainty observable from patient level outcomes data.

There is a well defined sense in which λ is little more than a nuisance parameter in ICE statistical inference. The wedge-shaped “count outwards” bootstrap confidence region (Obenchain 1999) is independent of choice of λ in sense that it is “equivariant” (commutative) under changes in λ. Indeed, the entire bootstrap distribution of ICE uncertainty itself has this equivariance property. When patient level outcomes are transformed using alternative values for λ, the key practical implication of equivariance is that every (x, y) point within the bootstrap ICE uncertainty scatter always remains either inside, on or else outside of the ICE ray confidence limits computed via the “count outwards” algorithm (Obenchain 1999). This point is dramatically illustrated in Figs. 7 and 8 for the high-uncertainty Dulx-Parx example introduced in Sect. 7.1.

Fig. 7
figure 7

Note that the bootstrap “count outwards” wedge-shaped 95% confidence region for the high-uncertainty Dulx-Parx example has a central polar angle of about 237°. This region is equivariant under changes in Lambda; it appears to be essentially unchanged between the left-hand and right-hand panels above, where Lambda increases by a factor of ten but the scale along the horizontal axis (effectiveness in cost units) also increases by a factor of ten. The two bootstrap scatters depicted here are not exactly identical because different initial random number “seeds” were used. Finally, note that when the confidence wedges are colored with linear NB preferences (β = γ = η = 1) additional economic variation has been injected and cannot be ignored. Note the resulting pair of relatively “contradictory” NB preference distribution histograms depicted in the two lower panels

Fig. 8
figure 8

Here are two more visualizations of the bootstrap “count outwards” wedge-shaped 95% confidence region for the high-uncertainty Dulx-Parx example of Fig. 7. We again see “equivariance” as Lambda increases from 0.026 (left side) to 0.26 (right side) because the x-axis scaling again increases by a factor of 10. This time the confidence wedges are colored using highly-directional nonlinear economic preference maps (Beta = 1, Gamma = Eta = 5.828), and we again see that changing Lambda can inject additional economic variation that cannot be ignored. The two corresponding relatively “contradictory” preference distribution histograms are again depicted in the two lower panels

Unfortunately, Figs. 7 and 8 also illustrate, by coloring bootstrap ICE outcomes within the equivariant confidence wedge with alternative preferences, that deliberately varying λ leads to incoherent ICE evaluations.

Because the “count outwards” confidence wedge (Obenchain 1999) displays equivariance under changes in λ, we contend that it quantifies only the statistical uncertainty within the two samples of patient level cost and effectiveness data about where the unknown true (ΔE, ΔC) outcome falls on the ICE plane. We do not wish to imply that sensitivity analyses concerning choice of λ should not be performed. However, we do think that health services researchers need to be much more aware of the extent of the distinct trauma injected into evaluations of economic preference by deliberately varying the numerical shadow price. After all, economic uncertainty about λ is quite separate from the quantifiable statistical uncertainty derived from patient level measurements of effectiveness and cost.

Because all ICE preference maps depend fundamentally upon choice of λ, and possibly also upon choice of returns-to-scale, β, and preference variation (shape) parameters, γ or η, we maintain that ICE preference maps are much more useful in interpreting the meaning of an equivariant ICE confidence region, as in Figs. 7 and 8, than in defining any such region.

9 Conclusions

There are two senses in which our efforts to consider nonlinear ICE preferences that are more realistic than linear NB have failed to circumvent the basic shortcomings of all ICE maps. First, all ICE maps attempt to reduce the effective dimensionality of the ICE decision space from two dimensions, (effectiveness, cost), to only one dimension …a scalar (possibly ordinal) measure of overall preference. Proponents of linear NB argue that confidence intervals for x-y differences are easier to construct and interpret than confidence intervals for standardized ICE slopes, s = y/x, or (unstandardized) ICE Ratios, λ × s. The reality is simply that NB confidence intervals are based upon overly simplistic and clearly unrealistic assumptions (like WTP = λ = WTA) that also make them much easier to misinterpret.

Second, due to the first axiom of ICE preferences, all ICE maps (linear or nonlinear) are much too sensitive to choice of λ to allow outcomes researchers to entertain a wide range of mutually exclusive and contradictory alternative values for the shadow price of health. While Figs. 7 and 8 illustrate that this practice can always inject unwanted incoherence, the effect is even more obvious in those situations where the central polar ICE angle of the wedge is less than 90° and the wedge lies mostly within the NE or SW quadrant.

On the other hand, our two-parameter, nonlinear ICE maps do illustrate some profound new relationships between basic ICE concepts.

9.1 Practical implications of the “link” function

As illustrated in Table 3, the “link” function, \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }, \) justifies and quantifies the “kink,” 0 ≤ WTP < WTA, in empirically observed consumer’s thresholds. Perhaps even more importantly, this link provides an objective way to determine λ. One simply elicits pairs of matched WTP and WTA values that may vary from patient to patient within a specified disease state and then looks for across-patient consistency in the geometric means of these WTP and WTA estimates. In particular, there is no need to express effectiveness in QALYs when eliciting WTP and WTA values to determine λ in this way. Being able to use natural, disease specific units simplifies the elicitation process and should improve empirical accuracy. Like traditional views of λ itself, the concept of a QALY is actually quite complex (Johnson 2005). In fact, the link function may well provide the very definition of a “fair” shadow price; all other definitions apparently suffer serious vagaries and shortcomings (Gafni and Birch 2006).

When it comes to actual practical applications of ICE inference, the link function allows both WTP and WTA to vary with λ held fixed, as in the ICE ray frontiers defining ALICE curves. Importantly, the link function dictates that WTA must increase if WTP is (arbitrarily) decreased …an implication that is profoundly different from the relatively naïve linear NB perspective where WTA = λ = WTP is assumed.

Example: Assume that λ really is the often mentioned value of $50,000/QALY but that local government authorities or local payers in some particular region insist that $10,000/QALY is the maximum additional cost that they can possibly agree to pay. This is a simple budget constraint that does nothing to change the full, fair shadow price of health, yet it does reduce the local WTP to λ/5. It would then be absolutely unfair and arbitrary to assume that that this sort of budget (maximum cost) restriction should also reduce WTA to λ/5. Instead, the \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} } \) link shows that the corresponding “fair” value of WTA would thereby increase to 5λ = $250,000/QALY. In other words, only treatments that reduce both cost and effectiveness by at least a net ICE ratio of $250,000/QALY have as high preferences to society as the desirable treatments that increase both cost and effectiveness by less than a net ICE ratio of $10,000/QALY.

Finally, no single nonlinear ICE map, implied by explicit numerical choices for the β and γ power parameters in Eq. 1, really needs to be singled out for preferred use. All symmetric, differentiable maps necessarily satisfy the same geometric-mean relationship, \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }, \) at all standardized outcome points (x, y) and (−y, −x).

9.2 Practical implications for ICE angle confidence and tolerance regions/intervals

In the “intervals or surfaces?” terminology of Briggs and Fenn (1998), ICE inference methods clearly need to be based upon 2-dimensional confidence regions (surfaces) rather than upon an infinite family of at least partially self-contradictory confidence intervals for overall preference that result from deliberately varying λ (Laska et al. 1999; Stinnett 1999; Stinnett and Mullahy 1998). There is no current consensus about how geometrically simple (easy to define in written text) the boundary of an “ideal” ICE confidence region needs to be.

What is clear is that wedge-shaped confidence regions (Briggs and Fenn 1998; Chaudhary and Stearns 1996; Cook and Heyse 2000; Obenchain 1997, 1999; Willan et al. 2001) have the potential to focus attention upon meaningful sub-regions of the ICE plane and to suggest clear preference-based actions. Specifically, the counter-clockwise (upper?) and clockwise (lower?) limiting ICE rays defining a wedge-shaped confidence region also define a confidence interval for the ICE ratio (s = y/x or ΔC/ΔE).

When applying Fieller’s theorem (Chaudhary and Stearns 1996), the computed limits may be imaginary for high levels of confidence, implying that the ICE ratio could then be any positive or negative numerical value. For example, the highest confidence level for which Fieller limits exist in the high-uncertainty Dulx-Parx example is 76%, with a counter-clockwise upper limit of slope zero and a clockwise lower limit also of slope zero. In fact, the equivariant bootstrap ICE confidence region with 76% confidence also has a central polar angle of almost 180° and consists of essentially the entire SE and SW quadrants, i.e., all positive or negative ICE ratios.

When the computed Fieller limits are real, they are the slopes of a pair of straight lines through the ICE origin, and the analyst still must determine which two of the resulting four ICE rays define the confidence wedge of interest. The correct choice becomes clear once the rays are plotted on the ICE plane along with the observed ICE outcome pair, (x, y) or (ΔE, ΔC); the correct pair of ICE rays then consists of the two rays closest to the observed ICE outcome point, counter-clockwise and clockwise, respectively.

When applying bootstrap methods (Briggs and Fenn 1998; Cook and Heyse 2000; Obenchain 1997, 1999; Willan et al. 2001), wedge-shaped confidence regions for all levels of confidence always exist. On the other hand, if a 95% confidence wedge were to occupy anywhere near 95% of the full ICE plane in terms of polar angular measure (i.e., 0.95 × 360° = 342°), that region would certainly not be very meaningful or interesting. For the high uncertainty Dulx-Parx example displayed in Figs. 7 and 8, the 95% confidence wedge subtends an ICE polar angle of 237°, which is only 65.8% of 360° and thus is at least somewhat restrictive and informative.

Furthermore, the minimum and maximum values of ICE radius observed for bootstrap outcomes falling strictly within a wedge-shaped ICE confidence region are easily computed. These ICE radii are expressed in the same units as both x and y (either effectiveness units or cost units). In high-uncertainty cases, the minimum observed ICE radius will be essentially zero, but the maximum will always be finite. In any case, a (strictly bounded) “wiper-blade” shaped ICE confidence region with a lower nominal confidence percentage than the original wedge-shaped region can be defined by counting “inward” a specified number of ICE radius order statistics, thereby decreasing the maximum ICE radius and/or increasing the minimum ICE radius.

Alternatively, a bootstrap ICE confidence wedge can be converted into an interesting ICE tolerance wedge or an ICE ratio tolerance interval (Obenchain 1999) by simply including a few additional ICE angle order statistics within the wedge. For example, suppose that 25,000 bootstrap replications are computed (default value in Obenchain 2005, 2007) and that ICE angle order statistics are sorted around a full circle centered at the ICE origin. Any set of 0.95 × 25 K = 23,750 consecutive such order statistics then constitutes a 95% confidence wedge in the high-uncertainty cases where the ICE origin falls within the convex hull of the bootstrap ICE uncertainty scatter. The equivariant bootstrap confidence wedge for 25 K replications always results from counting outwards (counter-clockwise and clockwise, respectively) 11,875 consecutive ICE angle order statistics from the observed ICE ratio (i.e., the sample which uses the observed outcomes for each patient exactly once). The corresponding equivariant bootstrap tolerance wedge for 25 K replications that includes at least 95% of the entire ICE uncertainty distribution with 95% confidence then results from counting outwards 11,904 consecutive ICE angle order statistics from the observed ICE ratio (Obenchain 1999).

In view of the above non-parametric results, we contend that “count outwards” bootstrap ICE angle confidence/tolerance regions provide robust answers. Furthermore, our experience is that these bootstrap confidence limits are generally in very good agreement with Fieller limits in medium-to-large sample situations when Fieller’s theorem does yield real limits (Chaudhary and Stearns 1996). Unfortunately, simulation studies published before 1998 (Briggs and Fenn 1998) and since (Fan and Zhou 2007) tend to focus on questionable computational algorithms and report somewhat contradictory results. This is clearly an area where much greater consensus is badly needed.

9.3 Pragmatic choice of ICE preference map

Suppose the question is: “Are there a few distinctive forms of ICE maps that tend to characterize the full spectrum of fundamentally different preference patterns?” There might then be as few as only three saliently different types of ICE maps. The Laupacis et al. (1992) map of Fig. 1 (with β = 0) and the linear NB map (Stinnett and Mullahy 1998) of Fig. 2b (with β = γ = 1) clearly represent two simple and historic forms. Rather than consider a range of values for the η = γ/β shape parameter of realistic, nonlinear ICE preferences, it makes sense to concentrate upon the extreme maps with shape η = Ω ≈ 5.828 of (2) that satisfy the monotonicity axiom and allow willingness, (4), to be any non-negative value, as in Figs. 2d and 8. This third extreme form of ICE preference map still allows β to be <1, =1 or >1 to determine decreasing, constant or increasing returns-to-scale, (3).

The realistic, nonlinear ICE preference maps proposed here encompass the entire ICE plane rather than focus attention upon any particular sub-region. It can be quite confusing and counter-productive to, instead, use different basic terminology (Kent et al. 2004) within the NE and SW quadrants. Similarly, the FOB suggestion (Fenwick et al. 2004) to divide the ICE plane up into many sub-regions is tedious and counter-productive.

Better comparisons of alternative methodologies for accessing uncertainty in ICE estimates and better vocabulary for generally communicating uncertainty are clearly needed. To yield realistic and reliable results, bootstrap computing algorithms actually need to be somewhat sophisticated. While use of polar coordinates in ICE inference can be a great help in arriving at the “right” answer, many outcomes researchers and most policy makers are clearly uninterested in this high level of detail. People expect results expressed in Cartesian coordinates or as ratios. Realistic approaches to ICE treatment comparisons must ultimately address the unprecedented challenge of making a truly bivariate (2-dimensional) inference problem meaningful to non-technical audiences.