Bayesian Optimal Interval Design: A Simple and Well


Download Bayesian Optimal Interval Design: A Simple and Well


Preview text

Statistics in CCR
Bayesian Optimal Interval Design: A Simple and Well-Performing Design for Phase I Oncology Trials
Ying Yuan1, Kenneth R. Hess1, Susan G. Hilsenbeck2, and Mark R. Gilbert3

Clinical Cancer Research

Abstract

Despite more than two decades of publications that offer more innovative model-based designs, the classical 3 þ 3 design remains the most dominant phase I trial design in practice. In this article, we introduce a new trial design, the Bayesian optimal interval (BOIN) design. The BOIN design is easy to implement in a way similar to the 3 þ 3 design, but is more flexible for choosing the target toxicity rate and cohort size and yields a substantially better performance that is comparable with that of more complex model-based designs. The BOIN design contains the 3 þ 3 design and the accelerated titration design as special cases, thus linking it

to established phase I approaches. A numerical study shows that the BOIN design generally outperforms the 3 þ 3 design and the modified toxicity probability interval (mTPI) design. The BOIN design is more likely than the 3 þ 3 design to correctly select the MTD and allocate more patients to the MTD. Compared with the mTPI design, the BOIN design has a substantially lower risk of overdosing patients and generally a higher probability of correctly selecting the MTD. User-friendly software is freely available to facilitate the application of the BOIN design. Clin Cancer Res; 22(17); 4291–301. Ó2016 AACR.

Disclosure of Potential Conflicts of Interest Y. Yuan is a consultant/advisory board member for Agenus. K.R. Hess is an uncompensated consultant/advisory board member for Angiochem. No potential conflicts of interest were disclosed by the other authors.

Editor's Disclosures The following editor(s) reported relevant financial relationships: W.E. Barlow—None.

CME Staff Planners' Disclosures The members of the planning committee have no real or apparent conflicts of interest to disclose.

Learning Objectives Upon completion of this activity, the participant should have a better understanding of using the Bayesian optimal interval (BOIN) design for phase I clinical trials. BOIN is a novel phase I design that is as simple to implement as the 3 þ 3 design, but yields significantly better performance comparable to more complicated model-based designs.

Acknowledgment of Financial or Other Support This activity does not receive commercial support.

Introduction
Despite more than 20 years of publications with innovative model-based clinical trial designs that offer widely acknowledged
1Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston,Texas. 2Duncan Cancer Center, Baylor College of Medicine, Houston, Texas. 3Center for Cancer Research, National Cancer Institute, Bethesda, Maryland.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Corresponding Author: Ying Yuan, The University of Texas MD Anderson Cancer Center, Unit 1411, 1400 Pressler Street, Houston, TX 77030. Phone: 713-563-4271; Fax: 712-563-4243; E-mail: [email protected]
doi: 10.1158/1078-0432.CCR-16-0592
Ó2016 American Association for Cancer Research.

improvements in efficiency, such designs are implemented in only a very small fraction of phase I trials. The 3 þ 3 design (1–3), although widely criticized for its poor operating characteristics (i.e., poor performance in computer simulations of a wide variety of dose–toxicity scenarios; refs. 4–7), remains the dominant phase I trial design used in practice. As evidence, of 34 phase I trials published in Clinical Cancer Research in 2015, 32 used classical 3 þ 3 designs or a related design with a minor variation. Most phase I trials conducted with the Children's Oncology Group have used the rolling 6 design (8), which trades a larger cohort and sample size in the face of rapid accrual for a faster completion of the trial, but shares similar operating characteristics with the 3 þ 3 design for identifying the MTD.
The major reason for the dominance of the 3 þ 3 design is its simplicity and transparency. The decision rule for dose escalation and de-escalation is determined before trial conduct, and

www.aacrjournals.org

4291

Yuan et al.

physicians can easily inspect the rules to judge whether they fit with clinical practice. In contrast, the well-performing and innovative model-based designs, for example, the continual reassessment method (CRM; refs. 9–13), are considered by many to be statistically and computationally complex, leading practitioners to perceive dose allocations as coming from a "black box," which has hindered their use in practice. It would be ideal to have a phase I trial design that is as simple as the 3 þ 3 design, but yields a performance that is comparable with that of the model-based designs.
The goal of this article is to introduce a novel Bayesian optimal interval (BOIN) design (14) that is simple to implement, similar to the 3 þ 3 design, but is much more flexible and possesses superior operating characteristics that are comparable with those of the more complex model-based methods. On the basis of our experience, the underlying idea of the BOIN design has been well received by oncologists and has been used to design a number of phase I trials at The University of Texas MD Anderson Cancer Center and Baylor College of Medicine. The statistical methodology of the BOIN design was provided by Liu and Yuan (14). Here, we focus on delineating the links and differences between the BOIN and the 3 þ 3 and related designs from a practical standpoint, paired with comprehensive numerical studies. Our goal is to change the current practice in which the vast majority of phase I trials use the 3 þ 3 design, and expedite the adoption of novel clinical trial designs, leading to improved efficacy and ethics of phase I trials.
Improved algorithm-based designs have been proposed to obtain better operating characteristics than the 3 þ 3 design. Durham and Flournoy (15) proposed the biased coin design that uses a biased coin to determine dose escalation and de-escalation. Lin and Shih (16) studied statistical properties of general "AþB" designs. Ivanova and colleagues (17) developed and compared several improved up-and-down designs. Ji and colleagues (18) proposed the modified toxicity probability interval (mTPI) design that performs better than the 3 þ 3 design.
BOIN Design
The BOIN design takes a very simple form, rendering it easy to implement in practice. The decision of dose escalation and deescalation involves only a simple comparison of the observed dose-limiting toxicity (DLT) rate at the current dose with a pair of fixed, prespecified dose escalation, and de-escalation boundaries. Specifically, let p^ denote the observed DLT rate at the current dose, defined as
^p ¼ the number of patients experiencing DLT at the current dose the total number of patients treated at the current dose;
and le and ld denote prespecified dose escalation and de-escalation boundaries. The BOIN design can be described as follows (see also Fig. 1).
1. Treat the first patient or cohort of patients at the lowest dose. (In some trials, another dose, such as the second lowest dose, may be used as the starting dose.)
2. To assign a dose to the next patient or cohort of patients, a. if p^  le, escalate the dose; b. if p^  ld, de-escalate the dose; c. otherwise, retain the current dose.
3. Repeat step 2 until the maximum sample size is reached.

The BOIN design shares the simplicity of the 3 þ 3 design, which makes the decision of dose escalation/de-escalation by comparing the observed DLT rate p^ with 0/3, 1/3, 2/3, 0/6, 1/6, and 2/6. The BOIN design makes the decision by comparing p^ with two fixed boundaries, le and ld, which is arguably even simpler. The statistical rationale behind the BOIN design and the technical details of determining le and ld are outlined in the Supplementary Data. Table 1 provides the values of le and ld for commonly used target toxicity rates. For example, given the target DLT rate of 30%, the corresponding escalation boundary le ¼ 0.236 and the de-escalation boundary ld ¼ 0.358. A BOIN design with cohorts of 3 patients will escalate the dose if 0 of 3 patients has DLT because the observed DLT rate p^ ¼ 0/3 < 0.236; de-escalate the dose if 2 of 3 patients have DLTs because the observed toxicity rate p^ ¼ 2/3 > 0.358; and retain the current dose if 1 of 3 patients has DLT because 0.236 < 1/3 < 0.358. This example demonstrates that the 3 þ 3 rule is actually nested within the BOIN design when the target DLT rate is 30% and the cohort size is 3. Because the 3 þ 3 design requires that the number of patients treated at a dose cannot exceed 6 patients, whereas the BOIN design does not impose that requirement, the dose escalation/deescalation rule for 6 patients may be different between the two designs.
The BOIN design, however, is much more flexible than the 3 þ 3 design. It can target any prespecified DLT rate. Such flexibility is of great clinical utility. For instance, for some cancer populations for whom there is no effective treatment, a target DLT rate higher than 30% may be an acceptable trade-off to achieve higher treatment efficacy, while for other cancer populations, a lower target DLT rate, for example, 15% or 20%, may be more appropriate.
In addition, unlike the 3 þ 3 design, for which the dose escalation and de-escalation decisions can be made only when we have 3 or 6 evaluable patients, the BOIN design does not require a fixed cohort size and allows for decision making at any time during the trial by comparing the observed DLT rate at the current dose with the escalation and de-escalation boundaries. Decisions regarding dose escalation and de-escalation can be made at any time as long as we can calculate the DLT rate at the current dose. Given the target DLT rate of 30%, the escalation boundary le ¼ 0.236 and the de-escalation boundary ld ¼ 0.358 are equivalent to the dose escalation and de-escalation rules shown in Table 2. Similar dose escalation and de-escalation rules for the target DLT rates of 15%, 20%, and 25% are provided in Supplementary Table S1 in the Supplementary Data. Such flexibility has important practical utility and implications. First, it allows clinicians to "adaptively" change the cohort size during the course of the trial to achieve certain design goals. For example, to shorten the trial duration and reduce the sample size, clinicians often prefer to use a cohort size of 1 for the initial dose escalation, and then switch to a cohort size of 3 after observing the first DLT, as with the commonly used accelerated titration design (ATD; ref. 19). Such an accelerated titration can be easily and seamlessly performed using the BOIN design by simply switching the cohort size from 1 to 3 when the first DLT is observed. Unlike the ATD, which combines two independent empirical rules, the accelerated titration rule and the 3 þ 3 rule, in an ad hoc way, the BOIN design achieves the same design goal under a single, coherent framework with assured statistical properties. In addition, the BOIN design includes the rolling 6 design as a special case. By allowing for the accrual of 2 to 6 patients concurrently, the BOIN design can mimic

4292 Clin Cancer Res; 22(17) September 1, 2016

Clinical Cancer Research

BOIN: A Simple, Well-Performing Bayesian Phase I Design

the rolling 6 design to achieve the same goal of trading a larger cohort and sample size for a faster completion of the trial. Therefore, in a sense, the BOIN design provides a generalization of the 3 þ 3, ATD, and rolling 6 designs.
The BOIN design also offers clinicians the flexibility to handle a "passive" change in the cohort size. Often, the actual number of patients available for decision making deviates from the planned cohort size. In many phase I trials that use the 3 þ 3 design, the actual number of patients treated at a dose often deviates from 3 or 6 for various logistic reasons; for example, some patients are not evaluable or have not received adequate treatment to be eligible (or many eligible patients become available in a short period). When that occurs, the decision of dose assignment is difficult for the next new patient because the 3 þ 3 design does not tell us how to assign the dose if the number of (evaluable) patients is not 3 or 6. In contrast, the BOIN design can easily handle that issue because its decision of dose escalation/de-escalation only involves assessing the observed toxicity rate, which is calculable as long as at least one

patient has been treated at the current dose and is evaluable, with escalation and de-escalation boundaries le and ld. For example, if only 4 of 6 patients enrolled at a dose level were evaluable and provided toxicity data, assuming the target DLT rate of 30%, the dose would be escalated if 0 of 4 patients has DLT (because the observed toxicity rate < 0.236), and de-escalated if  2 of 4 patients have DLTs (because the observed toxicity rate > 0.359), or the current dose would be retained if 1 of 4 patients has DLT.
The 3 þ 3 and BOIN designs take different approaches to select the MTD at the end of the trial. The 3 þ 3 design directly chooses the MTD as the dose that is one level below the dose that yields 2 or more DLTs, ignoring the data observed at other doses, whereas the BOIN design uses a statistical technique called isotonic regression to pool information across doses to obtain a more efficient statistical estimate of the MTD. The BOIN design offers some desirable statistical properties that the 3 þ 3 design lacks, such as coherence and consistency (see Supplementary Data for details).

Start at the lowest dose

Treat a patient or a cohort of patients

Figure 1. Flowchart of the BOIN design.

Stop the trial and

Yes

Reach the maximum

select the MTD

sample size?

No

≤ λe Escalate the dose

Compute the DLT rate at the current dose
Within (λe, λd)
Retain the current dose

≥ λd De-escalate the dose

www.aacrjournals.org

© 2016 American Association for Cancer Research
Clin Cancer Res; 22(17) September 1, 2016 4293

Yuan et al.

Table 1. Dose escalation and de-escalation boundaries

Boundary
le (escalation) ld (de-escalation)

0.1 0.078 0.119

0.15 0.118 0.179

0.2 0.157 0.238

Target toxicity rate for the MTD 0.25 0.197 0.298

0.3 0.236 0.358

0.35 0.276 0.419

0.4 0.316 0.479

Another feature of the BOIN design is that the sample size is prespecified, which allows researchers to calibrate and choose appropriate sample sizes to achieve the desirable probability of correctly estimating and selecting the MTD. In contrast, with the 3 þ 3 design, the sample size actually used in a clinical trial is random because the trial stops whenever 2 or more DLTs are observed at a dose. Because of such a stopping rule, the sample size of a 3 þ 3 design tends to be excessively small. One might regard that as an advantage. However, it is actually one of the major drawbacks of the 3 þ 3 design. The excessively small and random sample size means that the 3 þ 3 design has a low chance of correctly identifying the MTD (see "Numerical Study" below), and precludes the possibility of calibrating the sample size to obtain good operating characteristics. Under the 3 þ 3 design, the number of patients treated at any dose cannot be more than 6, which provides too little information to reliably estimate the true toxicity rate. For example, if 1 out of 6 patients experiences DLT, the estimate of the toxicity rate, 1/6 ¼ 16.7%, seems low, but the 95% exact confidence interval (CI) for that estimate is (0.004– 0.641), indicating that the true toxicity rate can be as high as 64.1%. Conversely, if 3 out of 6 patients experience DLTs, the estimate of the toxicity rate, 3/6 ¼ 50%, seems very high, but the 95% CI for that estimate is (0.118–0.88); and the true toxicity rate can be as low as 11.8%. In practice, this deficiency is often remedied by expanding the cohort at the "MTD" selected by the 3 þ 3 trial. Thus, the final sample size of a realized 3 þ 3 trial and a BOIN trial without an expansion cohort might be similar. However, the difference is that under the approach of the 3 þ 3 design plus cohort expansion, we lose the flexibility to continuously update our best estimate of the MTD based on the data accumulating during cohort expansion. Were the cohort expansion data to indicate that the MTD selected from the 3 þ 3 trial was overdosing or underdosing patients, we would have to manually modify the dose decision in an ad hoc way. In contrast, the BOIN design does not require post hoc cohort expansion, and the dose escalation/deescalation explicitly continues by treating each patient at a dose near the evolving estimate of the MTD.
An Example Trial
To illustrate the application of the BOIN design, we construct a hypothetical phase I trial that aims to find the MTD with a target DLT rate of 30%, 5 prespecified doses, and 30 patients. Figure 2 shows the process of the trial conduct. To accelerate dose escalation, the trial starts with a cohort size of 1, and then expands to a cohort size of 3 after the first DLT is observed, as in the ATD design. The trial starts with the first patient receiving dose level 1 without experiencing DLT. On the

basis of the dose escalation rules given in Table 2, the dose is then escalated to level 2 for the second patient, who also does not experience DLT. The dose escalation continues until the third patient experiences DLT at dose level 3, at which time the cohort is expanded to 3 by adding 2 more patients (i.e., patients 4 and 5) at dose level 3. Patients 4 and 5 do not experience DLTs. Retaining that dose, patients 6–8 are treated at dose level 3. Patients 6 and 7 do not experience DLTs and patient 8 is not evaluable. At that point, 5 evaluable patients have been treated at dose level 3 and one has experienced DLT. If the 3 þ 3 design were used, it would be difficult to make the decision of dose assignment for the subsequent cohort because the number of patients at the current dose is not 3 or 6. In contrast, according to Table 2, the BOIN design escalates the dose to level 4 to treat patients 9–11, among whom patients 10 and 11 experience DLTs. If the 3 þ 3 design were used, the trial would stop because 2 of the 3 most recently enrolled patients experience toxicity, precluding the chance to further learn the toxicity profile of the doses and "claim" dose level 3 as the MTD. In contrast, the BOIN design allows us to continue to learn the toxicity of the doses by de-escalating the dose to level 3 to treat patients 12–14, none of whom experiences DLT.
Then, at dose level 3, among a total of 9 treated patients, 8 are evaluable and only one patient has experienced DLT. According to the rules in Table 2, the dose is then escalated to level 4 to treat patients 15 to 17, none of whom experiences DLT. Of the 6 patients treated at dose level 4, only 2 of them have experienced DLTs. Thus, that dose is retained and patients 18 to 20 are treated at dose level 4. Similarly, based on the dose escalation/de-escalation rule of the BOIN design, patients 21– 30 are all treated at dose level 4. Although patients 19 and 23 are not evaluable and the last patient (patient 30) does not form a complete cohort (of 3 patients), there is no issue under the BOIN design because it allows for decision making with any number of patients. At the end of the trial, a total of 17 evaluable patients have been treated at dose level 4, and 5 patients have experienced DLTs. Thus, dose level 4 is chosen as the MTD, with the estimated DLT rate ¼ 29.4% and the 95% CI, 0.10–0.56. In contrast, using the 3 þ 3 design, dose level 3 would have been chosen as the MTD, with an estimated DLT rate of 20% and a much wider 95% CI, 0.005–0.72.
Numerical Study
Simulation setting We used computer simulations to evaluate the operating char-
acteristics of the BOIN design. We considered a dose-finding trial with 5 dose levels and a maximum sample size of 30 patients (i.e.,

Table 2. Dose escalation and de-escalation boundaries for target toxicity rate ¼ 30%

The number of patients treated at the current dose

Action

1

2

3

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Escalate if no. of DLTs 

00001

1

1

1

22

2

2

3

3

3

3

4

4

De-escalate if no. of DLTs  1

1

2 2 233344

4

5

5

6

6

6

7

7

4294 Clin Cancer Res; 22(17) September 1, 2016

Clinical Cancer Research

BOIN: A Simple, Well-Performing Bayesian Phase I Design

Figure 2.
A hypothetical phase I clinical trial using the BOIN design. The numbers indicate the patient's identification. Three patients in each box form a cohort.

Dose level

5 Stop under 3 + 3

9 10 11

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

4

345678

12 13 14

3

2 2
1 1

No DLT DLT Not evaluable
© 2016 American Association for Cancer Research

the maximum sample size of the 3 þ 3 design). We investigated four commonly used target DLT rates: 15%, 20%, 25%, and 30%. For each of the target DLT rates, we examined 16 representative toxicity scenarios (i.e., the true toxicity rates of the five investigational doses), which varied in the location of the MTD and the gaps around the MTD. Under the standard assumption that toxicity monotonically increases with the dose, the gap (i.e., difference) between the MTD and its two adjacent doses controls the difficulty of dose finding because these adjacent doses are the most difficult ones to distinguish from the MTD. Table 3 shows the 16 true toxicity scenarios with target DLT rates of 20% and 25%. The scenarios for target DLT rates of 15% and 30% are given in Supplementary Table S2 in the Supplementary Data. Similar toxicity scenarios have been used to compare different phase I trial designs (20). Under each scenario, we simulated 10,000 trials to compare the BOIN design with the 3 þ 3 and mTPI designs. Because the 3 þ 3 design often stopped the trial early (e.g., when 2 out of 3 patients experienced DLTs) before reaching 30 patients, in these cases, the remaining patients were

treated at the selected "MTD" as the cohort expansion, such that the three designs had comparable sample sizes. An alternative approach to match the average sample size of three designs is to use the average sample size of the 3 þ 3 design as the sample size for the mTPI and BOIN designs. However, as explained in the Supplementary Data, that approach yields severely biased results. There are many variations of the 3 þ 3 design. The 3 þ 3 design that we used for the comparison is described in the Supplementary Data. We implemented the BOIN design using the R package "BOIN" with its default design parameters (21), the mTPI design using the Web application with the interval width epsilon1 ¼ epsilon2 ¼ 0.03 (22). The mTPI and BOIN designs were implemented in a more efficient, fully sequential way (i.e., patients were treated one by one) because that is one important advantage of these two designs.
Performance metrics We considered four metrics to measure the performance of the
designs:

Table 3. Sixteen true toxicity scenarios with the target DLT rates of 0.2 and 0.25

Scenario
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.20a 0.20 0.15 0.15 0.10 0.10 0.08 0.08 0.05 0.05 0.05 0.05 0.02 0.02 0.02 0.01

2
0.25 0.30 0.20 0.20 0.20 0.20 0.15 0.15 0.10 0.10 0.10 0.10 0.06 0.06 0.05 0.06

Dose level 3
0.35 0.40 0.25 0.30 0.25 0.30 0.20 0.20 0.20 0.20 0.15 0.15 0.10 0.10 0.07 0.10

4
0.45 0.50 0.35 0.45 0.35 0.40 0.25 0.30 0.25 0.30 0.20 0.20 0.20 0.20 0.10 0.15

5
0.50 0.60 0.45 0.55 0.45 0.55 0.35 0.45 0.40 0.45 0.25 0.30 0.25 0.30 0.20 0.20

aBold font indicates the MTD.

Scenario
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.25a 0.25 0.20 0.18 0.15 0.13 0.10 0.10 0.10 0.06 0.02 0.02 0.02 0.01 0.01 0.01

Dose level

2

3

0.35

0.45

0.32

0.40

0.25

0.35

0.25

0.32

0.25

0.35

0.25

0.32

0.15

0.25

0.18

0.25

0.15

0.25

0.13

0.25

0.10

0.20

0.10

0.18

0.10

0.20

0.07

0.13

0.05

0.10

0.06

0.12

4
0.60 0.50 0.45 0.40 0.50 0.40 0.30 0.32 0.35 0.32 0.25 0.25 0.25 0.25 0.15 0.18

5
0.70 0.60 0.60 0.50 0.65 0.50 0.40 0.40 0.50 0.40 0.30 0.32 0.35 0.32 0.25 0.25

www.aacrjournals.org

Clin Cancer Res; 22(17) September 1, 2016 4295

Yuan et al.

i The percentage of correct selection (PCS) of the true MTD in 10,000 simulated trials.
ii The average number of patients allocated to the MTD across 10,000 simulated trials.
iii The risk of overdosing, which is defined as the percentage of simulated trials in which a large percentage (e.g., more than 60% or 80%) of patients are treated at doses above the MTD, that is, how likely the design treats more than 60% or 80% of the patients at doses above the MTD. This risk measure is practically more relevant and useful than the average number of patients treated above the MTD across 10,000 simulated trials because in practice, the trial is typically conducted only once. What concerns the investigator is how likely the current trial overdoses a large percentage of patients, not if the trial was repeated thousands of times, on average how many patients would be overdosed.
iv The risk of underdosing, which is defined as the percentage of simulated trials in which more than 80% of patients are treated at doses below the MTD (i.e., potentially

subtherapeutic doses). We chose a higher cut-off value of 80% to define underdosing because in practice, underdosing tends to be of less concern than overdosing.
Results
The percentage of correct selection of the MTD As shown in Fig. 3, the BOIN design outperforms the 3 þ 3
design with a substantially higher percentage of correct selection (PCS) of the MTD. For example, when the target DLT rate is 25%, the PCS of the BOIN design is mostly 12% to 16% higher than that of the 3 þ 3 design. In particular, when the MTD is the highest dose (i.e., scenarios 15 and 16), the PCS of the BOIN design almost triples that of the 3 þ 3 design. The BOIN design also outperforms the mTPI design, especially when the target DLT rate is low, such as 15% or 20%. In these cases, the PCS of the BOIN design is often about 6% to 10% higher than that of the mTPI design.

Correct selection (%)

Target DLT rate = 15%

70 BOIN

mTPI

60

3 + 3

50

40

30

20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 25% 70

60

50

40

30

20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

Correct selection (%)

Correct selection (%)

Target DLT rate = 20% 70
60
50
40
30
20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario Target DLT rate = 30%
70 60 50 40 30 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
© 2016 American Association for Cancer Research

Correct selection (%)

Figure 3. The PCS of the MTD under the 3 þ 3, mTPI, and BOIN designs when the target toxicity rates are 15%, 20%, 25%, and 30%. A higher value is better.

4296 Clin Cancer Res; 22(17) September 1, 2016

Clinical Cancer Research

BOIN: A Simple, Well-Performing Bayesian Phase I Design

Average no. of patients at MTD

Average no. of patients at MTD

Target DLT rate = 15% 20
BOIN mTPI 3 + 3
15

Target DLT rate = 20% 20
15

10

10

5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 25% 20

5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 30% 20

Average no. of patients at MTD

Average no. of patients at MTD

15
10
5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

15
10
5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
© 2016 American Association for Cancer Research

Figure 4. Average number of patients allocated to the MTD under the 3 þ 3, mTPI, and BOIN designs when the target toxicity rates are 15%, 20%, 25%, and 30%. A higher value is better.

Average number of patients allocated to the MTD The performance of the 3 þ 3 design depends on the location of
the MTD and the target DLT rate (see Fig. 4). When the MTD is located at low doses (e.g., doses 1 and 2, corresponding to scenarios 1–6), the 3 þ 3 design performs reasonably well. However, when the MTD is located at high doses (doses 4 and 5, corresponding to scenarios 11–16) or the target DLT rate is 30%, the 3 þ 3 design performs substantially worse than the mTPI and BOIN designs. The BOIN design generally outperforms the mTPI design, assigning more patients to the MTD when the target DLT rate is 15% or 20%. The two designs are comparable when the target DLT rate is 25% or 30%.
Risk of overdosing Among the three designs, the mTPI design has the highest risk
of overdosing (i.e., assigning more than 60% or 80% of the patients to doses above the MTD), especially when the target DLT rates are 20%, 25%, and 30% (see Figs. 5 and 6). For example,

when the target DLT rate is 25%, the mTPI design often has more than 40% chance of assigning more than 60% of patients to overly toxic doses, and more than 30% chance of assigning more than 80% of patients to overly toxic doses. In the Discussion section, we provide a theoretical explanation why the mTPI design tends to have such an alarmingly high risk of overdosing patients. The 3 þ 3 design generally has the lowest risk of overdosing when the target DLT rates are 25% and 30%. This is consistent with previous research that found the 3 þ 3 design to be overly conservative (4– 7). Although being safe is desirable, being overly conservative is undesirable and results in poor precision for identifying the MTD. Because the dose selected in phase I is used in subsequent phase II trials to treat a much larger number of patients, misidentification of the MTD has the serious consequence of potentially treating a large number of patients at overly toxic or subtherapeutic doses. The BOIN design strikes a good balance in safety (i.e., the risk of overdosing) and identifying the MTD. Compared with the 3 þ 3 design, the BOIN design has much higher PCS of the MTD

www.aacrjournals.org

Clin Cancer Res; 22(17) September 1, 2016 4297

Yuan et al.

Risk of overdosing (%)

Target DLT rate = 15%

60

BOIN

mTPI

50

3 + 3

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

Target DLT rate = 25% 60

50

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

Risk of overdosing (%)

Risk of overdosing (%)

Target DLT rate = 20% 50
40
30
20
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario Target DLT rate = 30%
50
40
30
20
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
© 2016 American Association for Cancer Research

Risk of overdosing (%)

Figure 5. Risk of overdosing 60% or more of patients under the 3 þ 3, mTPI, and BOIN designs when the target toxicity rates are 15%, 20%, 25%, and 30%. A lower value is better.

(see Fig. 3). Compared with the mTPI, the BOIN design has a substantially lower risk of overdosing in almost all scenarios. Specifically, the risk of overdosing 80% or more of patients under the BOIN design is less than half of that under the mTPI design in most scenarios (see Fig. 6).
Risk of underdosing As the 3 þ 3 design is conservative, it is not surprising that it
generally has a higher risk of underdosing (i.e., treating more than 80% of patients at doses below the MTD), especially when the target DLT rate is 25% or 30% (see Fig. 7). The mTPI performs well when the target DLT rate is 25% or 30%, but has the highest risk of underdosing when the target DLT rate is 15%. In most scenarios, the BOIN design has the lowest or close to the lowest risk of underdosing.
Software for Practical Implementation
To facilitate the use of the BOIN design, we developed two freely available programs: an R package "BOIN" and a standalone

desktop application. The desktop application has an intuitive graphical user interface and is convenient to use for most phase I trials. The R package provides extra flexibility that allows users to modify the code, if needed, to add additional features that have not been included in the package. The R package "BOIN" is available from CRAN (21), and the desktop program is available at the MD Anderson Software Download website (23). A statistical tutorial and protocol template for using the BOIN design are provided at the first author's website (24).
Discussion
This article introduces the BOIN design and compares it with the 3 þ 3 and mTPI designs. The BOIN design is built upon rigorous statistical principles and treats each patient at dose levels near the evolving estimate of the MTD. This design is easy to implement in a manner similar to the 3 þ 3 design, but provides much more flexibility in choosing the target toxicity rate and cohort size, and yields a substantially better performance. A numerical study showed that the BOIN design is more likely to

4298 Clin Cancer Res; 22(17) September 1, 2016

Clinical Cancer Research

BOIN: A Simple, Well-Performing Bayesian Phase I Design

Risk of overdosing (%)

Target DLT rate = 15%

60 BOIN

mTPI

50

3 + 3

40

30

20

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 25% 50

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

Risk of overdosing (%)

Risk of overdosing (%)

Target DLT rate = 20% 50 40 30 20 10 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 30% 40
30
20
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
© 2016 American Association for Cancer Research

Risk of overdosing (%)

Figure 6. Risk of overdosing 80% or more of patients under the 3 þ 3, mTPI, and BOIN designs when the target toxicity rates are 15%, 20%, 25%, and 30%. A lower value is better.

correctly choose the MTD and allocate more patients to the MTD than the 3 þ 3 design, and has substantially lower risk of overdosing patients than the mTPI design.
The reason that the mTPI design has an excessively high risk of overdosing patients lies in the core of that method, that is, using the unit probability mass (UPM) as the criterion to determine dose escalation. Specifically, the mTPI defines three dosing intervals (i.e., the underdosing interval, proper dosing interval, and overdosing interval). Given a dosing interval and the observed toxicity data, the UPM is defined as the posterior probability of the interval divided by the length of the interval. The mTPI makes the decision of dose escalation and de-escalation based on which interval has the largest UPM. If the underdosing (or overdosing or proper dosing) interval has the largest UPM, the dose is escalated (or de-escalated or stays at the same level). Unfortunately, the UPM is not an appropriate indication of the toxicity of a dose, and leads to problematic decisions. To visualize this problem, consider a trial for which the target toxicity rate is 0.2, and the underdosing, proper dosing, and overdosing intervals are

(0–0.17), (0.17–0.23), and (0.23–1), respectively. Suppose that at a certain stage of the trial, the observed data indicate that the posterior probabilities of the underdosing interval, proper dosing interval, and overdosing interval are 0.01, 0.09, and 0.9, respectively. In other words, the data indicate that there is a 90% chance that the current dose is overdosing and only a 9% chance that the current dose provides proper dosing. Despite such dominant evidence of overdosing, the mTPI dictates that the design stays at the same dose for treating the next new patient because the UPM for the proper dosing interval is the largest. Specifically, the UPM for the proper dosing interval is 0.09/(0.23–0.17) ¼ 1.5, and the UPM for the overdosing interval is 0.9/(1–0.23) ¼ 1.17. This example demonstrates that the UPM is not an appropriate indication of the toxicity of a dose, and as a result, the mTPI tends to keep treating patients at a toxic dose even when there is strong evidence for that dose being overly toxic. Our results seem to contradict those of the previous simulation study by Ji and Wang (25), which claimed that the mTPI is safer than the 3 þ 3 design. As detailed in the Supplementary Data (see Supplementary Fig. S1

www.aacrjournals.org

Clin Cancer Res; 22(17) September 1, 2016 4299

Yuan et al.

Risk of underdosing (%)

Target DLT rate = 15%

40 BOIN

mTPI

30

3 + 3

20

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 25% 50
40
30
20
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario

Risk of underdosing (%)

Risk of underdosing (%)

Target DLT rate = 20% 50
40
30
20
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
Target DLT rate = 30% 60 50 40 30 20 10 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Scenario
© 2016 American Association for Cancer Research

Risk of underdosing (%)

Figure 7. Risk of underdosing (i.e., assigning more than 80% of patients to doses below the MTD) under the 3 þ 3, mTPI, and BOIN designs when the target toxicity rates are 15%, 20%, 25%, and 30%. A lower value is better.

and Supplementary Table S4), the simulation in that study is biased because of the inappropriate way the sample sizes were matched between the designs.
Recently, the BOIN design has been extended to find the MTD for drug–combination trials (26), which may further improve the utility of the BOIN design in practice. Under the BOIN design, many practical considerations are either automatically or easily accommodated. For example, the 3 þ 3 design often includes one or more expansion cohorts with no way to monitor toxicity; whereas the BOIN design naturally accommodates ongoing monitoring by continuously treating patients under its dose escalation and de-escalation rules. In addition, the BOIN design allows for starting the trial from any prespecified dose level, and stopping the trial when a dose accumulates a certain number of patients.
The dose escalation and de-escalation boundaries provided in Table 1 are approximately symmetric around the target DLT rate. In some applications, we may prefer a tighter (i.e., lower) deescalation boundary to impose a higher safety requirement. This can be done by reducing the value of the highest acceptable DLT rate in the BOIN software, which results in a tighter de-escalation

boundary. Supplementary Table S3 in the Supplementary Data provides such an example. Using a tighter de-escalation boundary decreases the risk of overdosing, but the tradeoff is that it may reduce the PCS and the number of patients allocated to the MTD. This is because to correctly identify the MTD, it is necessary to experiment at the doses both below and above the MTD. In general, a conservative design tends to yield lower precision to identify the MTD, as exemplified by the 3 þ 3 design. Given the fact that the BOIN has a substantially lower risk of overdosing patients than the mTPI, overdosing may not be a particular concern for the BOIN. If the investigator prefers a lower risk of overdosing, we recommend the boundaries in Supplementary Table S3, which generally yield good operating characteristics.
We compared the BOIN, 3 þ 3, and mTPI designs because they share similar simplicity and therefore are more likely to be implemented in practice. We did not include the CRM in our comparison because that design is more complicated to implement in practice. In addition to a lack of transparency, the choice of the model and prior in the CRM can be difficult for physicians, and an inappropriate choice can affect the performance of the

4300 Clin Cancer Res; 22(17) September 1, 2016

Clinical Cancer Research

Preparing to load PDF file. please wait...

0 of 0
100%
Bayesian Optimal Interval Design: A Simple and Well