[Federal Register Volume 62, Number 90 (Friday, May 9, 1997)]
[Notices]
[Pages 25712-25726]
From the Federal Register Online via the Government Publishing Office [www.gpo.gov]
[FR Doc No: 97-12139]
[[Page 25711]]
_______________________________________________________________________
Part III
Department of Health and Human Services
_______________________________________________________________________
Food and Drug Administration
_______________________________________________________________________
International Conference on Harmonisation; Draft Guideline on
Statistical Principles for Clinical Trials; Notice of Availability
Federal Register / Vol. 62, No. 90 / Friday, May 9, 1997 / Notices
[[Page 25712]]
=======================================================================
-----------------------------------------------------------------------
DEPARTMENT OF HEALTH AND HUMAN SERVICES
Food and Drug Administration
[Docket No. 97D-0174]
International Conference on Harmonisation; Draft Guideline on
Statistical Principles for Clinical Trials; Availability
AGENCY: Food and Drug Administration, HHS.
ACTION: Notice.
-----------------------------------------------------------------------
SUMMARY: The Food and Drug Administration (FDA) is publishing a draft
guideline entitled ``Statistical Principles for Clinical Trials.'' The
draft guideline was prepared under the auspices of the International
Conference on Harmonisation of Technical Requirements for Registration
of Pharmaceuticals for Human Use (ICH). The draft guideline is intended
to provide recommendations to sponsors and scientific experts regarding
statistical principles and methodology which, when applied to clinical
trials for marketing applications, will facilitate the general
acceptance of analyses and conclusions drawn from the trials.
DATES: Written comments by June 23, 1997.
ADDRESSES: Submit written comments on the draft guideline to the
Dockets Management Branch (HFA-305), Food and Drug Administration,
12420 Parklawn Dr., rm. 1-23, Rockville, MD 20857. Copies of the draft
guideline are available from the Drug Information Branch (HFD-210),
Center for Drug Evaluation and Research, Food and Drug Administration,
5600 Fishers Lane, Rockville, MD 20857, 301-827-4573. Single copies of
the draft guideline may be obtained by mail from the Office of
Communication, Training and Manufacturers Assistance (HFM-40), Center
for Biologics Evaluation and Research (CBER), 1401 Rockville Pike,
Rockville, MD 20852-1448 or by calling the CBER Voice Information
System at 1-800-835-4709 or 301-827-1800. Copies may be obtained from
CBER's FAX Information System at 1-888-CBER-FAX or 301-827-3844.
FOR FURTHER INFORMATION CONTACT:
Regarding the guideline: Robert T. O'Neill, Center for Drug
Evaluation and Research (HFD-700), Food and Drug Administration, 5600
Fishers Lane, Rockville, MD 20857, 301-827-3195.
Regarding the ICH: Janet J. Showalter, Office of Health Affairs
(HFY-20), Food and Drug Administration, 5600 Fishers Lane, Rockville,
MD 20857, 301-827-0864.
SUPPLEMENTARY INFORMATION: In recent years, many important initiatives
have been undertaken by regulatory authorities and industry
associations to promote international harmonization of regulatory
requirements. FDA has participated in many meetings designed to enhance
harmonization and is committed to seeking scientifically based
harmonized technical procedures for pharmaceutical development. One of
the goals of harmonization is to identify and then reduce differences
in technical requirements for drug development among regulatory
agencies.
ICH was organized to provide an opportunity for tripartite
harmonization initiatives to be developed with input from both
regulatory and industry representatives. FDA also seeks input from
consumer representatives and others. ICH is concerned with
harmonization of technical requirements for the registration of
pharmaceutical products among three regions: The European Union, Japan,
and the United States. The six ICH sponsors are the European
Commission, the European Federation of Pharmaceutical Industries
Associations, the Japanese Ministry of Health and Welfare, the Japanese
Pharmaceutical Manufacturers Association, the Centers for Drug
Evaluation and Research and Biologics Evaluation and Research, FDA, and
the Pharmaceutical Research and Manufacturers of America. The ICH
Secretariat, which coordinates the preparation of documentation, is
provided by the International Federation of Pharmaceutical
Manufacturers Associations (IFPMA).
The ICH Steering Committee includes representatives from each of
the ICH sponsors and the IFPMA, as well as observers from the World
Health Organization, the Canadian Health Protection Branch, and the
European Free Trade Area.
On January 17, 1997, the ICH Steering Committee agreed that a draft
guideline entitled ``Statistical Principles for Clinical Trials''
should be made available for public comment. The draft guideline is the
product of the Efficacy Expert Working Group of the ICH. Comments about
this draft will be considered by FDA and the other regulatory agency
members of the Efficacy Expert Working Group.
The draft guideline addresses principles of statistical methodology
applied to clinical trials for marketing applications. The draft
guideline provides recommendations to sponsors in the design, conduct,
analysis, and evaluation of clinical trials of an investigational
product in the context of its overall clinical development. The draft
guideline also provides guidance to scientific experts in preparing
application summaries or assessing evidence of efficacy and safety,
principally from late Phase II and Phase III clinical trials.
Application of the principles of statistical methodology is intended to
facilitate the general acceptance of analyses and conclusions drawn
from clinical trials.
This draft guideline represents the agency's current thinking on
statistical principles for clinical trials of drugs and biologics. It
does not create or confer any rights for or on any person and does not
operate to bind FDA or the public. An alternative approach may be used
if such approach satisfies the requirements of the applicable statute,
regulations, or both.
Interested persons may, on or before June 23, 1997, submit to the
Dockets Management Branch (address above) written comments on the draft
guideline. Two copies of any comments are to be submitted, except that
individuals may submit one copy. Comments are to be identified with the
docket number found in brackets in the heading of this document. The
draft guideline and received comments may be seen in the office above
between 9 a.m. and 4 p.m., Monday through Friday.
An electronic version of this draft guideline is available on the
Internet using the World Wide Web (WWW) (http://www.fda.gov/cder/
guidance.htm) or through the CBER home page (http://www.fda.gov/cber/
cberftp.html).
The text of the draft guideline follows:
Statistical Principles for Clinical Trials
Note: A Glossary of terms and definitions is provided as an
annex to this guideline.
Table of Contents
I. Introduction
1.1 Background and Purpose
1.2 Scope and Direction
II. Considerations for Overall Clinical Development
2.1 Study Context
2.1.1 Development Plan
2.1.2 Confirmatory Trial
2.1.3 Exploratory Trial
2.2 Study Scope
2.2.1 Population
2.2.2 Primary and Secondary Variables
2.3 Design Techniques to Avoid Bias
2.3.1 Blinding
2.3.2 Randomization
III. Study Design Considerations
3.1 Study Configuration
3.1.1 Parallel Group Design
3.1.2 Cross-Over Design
3.1.3 Factorial Designs
[[Page 25713]]
3.2 Multicenter Trials
3.3 Type of Comparison
3.3.1 Trials to Show Superiority
3.3.2 Trials to Show Equivalence or Non-inferiority
3.3.3 Dose-Response Designs
3.4 Group Sequential Designs
3.5 Sample Size
3.6 Data Capture and Processing
IV. Study Conduct
4.1 Trial Monitoring
4.2 Changes in Inclusion and Exclusion Criteria
4.3 Accrual Rates
4.4 Sample Size Adjustment
4.5 Interim Analysis and Early Stopping
4.6 Role of Independent Data Monitoring Committee (IDMC)
V. Data Analysis
5.1 Prespecified Analysis Plan
5.2 Analysis Sets
5.2.1 All Randomized Subjects
5.2.2 Per Protocol Subjects
5.2.3 Roles of the All Randomized Subjects Analysis and the
Per Protocol Analysis
5.3 Missing Values and Outliers
5.4 Data Transformation/Modification
5.5 Estimation, Confidence Intervals and Hypothesis Testing
5.6 Adjustment of Type I Error and Confidence Levels
5.7 Subgroups, Interactions and Covariates
5.8 Integrity of Data and Computer Software
VI. Evaluation of Safety and Tolerability
6.1 Scope of Evaluation
6.2 Choice of Variables and Data Collection
6.3 Set of Subjects to be Evaluated and Presentation of Data
6.4 Statistical Evaluation
6.5 Single Study versus Integrated Summary
VII. Reporting
7.1 Evaluation and Reporting
7.2 Summarizing the Clinical Database
7.2.1 Efficacy Data
7.2.2 Safety Data
Annex 1 Glossary
I. Introduction
1.1 Background and Purpose
The efficacy and safety of medicinal products should be
demonstrated by clinical trials that follow the guidance in ``Good
Clinical Practice: Consolidated Guideline (E6)'' adopted by the ICH,
May 1, 1996. The role of statistics in clinical trial design and
analysis is acknowledged as essential in that ICH guideline. The
proliferation of statistical research in the area of clinical trials
coupled with the critical role of clinical research in the drug
approval process and health care in general necessitate a succinct
document on statistical issues related to clinical trials. This
guideline is written primarily to attempt to harmonize the
principles of statistical methodology applied to clinical trials for
marketing applications submitted in Europe, Japan, and the United
States.
As a starting point, this guideline utilized the CPMP (Committee
for Proprietary Medicinal Products) Note for Guidance entitled
``Biostatistical Methodology in Clinical Trials in Applications for
Marketing Authorizations for Medicinal Products'' (December 1994).
It was also influenced by ``Guidelines on the Statistical Analysis
of Clinical Studies'' (March 1992) from the Japanese Ministry of
Health and Welfare and the U.S. FDA document entitled ``Guideline
for the Format and Content of the Clinical and Statistical Sections
of New Drug Applications'' (July 1988). Some topics related to
statistical principles and methodology are also embedded within
other ICH guidelines, particularly those listed below. The specific
guideline that contains related text will be identified in various
sections of this document.
E1: The Extent of Population Exposure to Assess Clinical Safety
E2A: Clinical Safety Data Management: Definitions and Standards
for Expedited Reporting
E2B: Clinical Safety Data Management: Data Elements for
Transmission of Individual Case Safety Reports
E2C: Clinical Safety Data Management: Periodic Safety Update
Reports for Marketed Drugs
E3: Structure and Content of Clinical Study Reports
E4: Dose-Response Information to Support Drug Registration
E5: Ethnic Factors in the Acceptability of Foreign Clinical Data
E6: Good Clinical Practice: Consolidated Guideline
E7: Studies in Support of Special Populations: Geriatrics
E8: General Considerations for Clinical Trials
E10: Choice of Control Group in Clinical Trials
M1: Standardization of Medical Terminology for Regulatory
Purposes
M3: Nonclinical Safety Studies for the Conduct of Human Clinical
Trials for Pharmaceuticals
This guideline is intended to give direction to sponsors in the
design, conduct, analysis, and evaluation of clinical trials of an
investigational product in the context of its overall clinical
development. The document will also assist scientific experts
charged with preparing application summaries or assessing evidence
of efficacy and safety, principally from late Phase II and Phase III
clinical trials.
1.2 Scope and Direction
The focus of this guideline is on statistical principles. It
does not address the use of specific statistical procedures or
methods. Specific procedural steps to ensure that principles are
implemented properly are the responsibility of the sponsor.
Integration of data across clinical trials is discussed, but is not
a primary focus of this guideline. Selected principles and
procedures related to data management or clinical trial monitoring
activities are covered in other ICH guidelines and are not addressed
here.
This guideline should be of interest to individuals from a broad
range of scientific disciplines. However, it is assumed that the
actual responsibility for all statistical work associated with
clinical trials will lie with an appropriately qualified and
experienced statistician, as indicated in the ``ICH Guideline for
Good Clinical Practice.'' The involvement of the statistician, in
collaboration with other clinical trial professionals, is to ensure
that statistical principles are applied appropriately in clinical
trials supporting drug development. Thus, the statistician should
have a combination of education/training and experience sufficient
to implement the principles articulated in this guideline.
All important details of the design, conduct, and proposed
analysis of each clinical trial contributing to a marketing
application should be clearly specified in a protocol written before
the trial begins. The extent to which the procedures in the protocol
are followed and the primary analysis is planned a priori will
contribute to the degree of confidence in the final results and
conclusions of the trial. The protocol and subsequent amendments
should be approved by the responsible personnel, including the trial
statistician. The trial statistician should ensure that the protocol
and any amendments cover all relevant statistical issues clearly and
accurately, using technical terminology as appropriate.
The principles outlined in this guideline are primarily relevant
to clinical trials conducted in the later phases of development,
many of which are confirmatory trials of efficacy. In addition to
efficacy, confirmatory trials may have as their primary variable a
safety variable (e.g., an adverse event, a clinical laboratory
variable, or an electrocardiographic measure) or a pharmacodynamic
or pharmacokinetic variable (as in a confirmatory bioequivalence
trial). Furthermore, some confirmatory findings may be derived from
data integrated across studies, and selected principles in this
guideline are applicable in this situation. Finally, although the
early phases of drug development consist mainly of clinical trials
that are exploratory in nature, statistical principles are also
relevant to these clinical trials. Hence, the substance of this
document should be applied as far as possible to all phases of
clinical development.
Many of the principles delineated in this guideline deal with
minimizing bias and maximizing precision. As used in this guideline,
the term ``bias'' describes the systematic tendency of any factors
associated with the design, conduct, analysis, and interpretation of
the results of clinical trials to make the estimate of a treatment
effect deviate from its true value. It is important to identify
potential sources of bias to the extent possible so that attempts to
limit such bias may be made. The presence of bias may seriously
compromise the ability to draw valid conclusions from clinical
studies.
Some sources of bias arise from the design of the trial, for
example an assignment of treatments such that subjects at lower risk
are systematically assigned to one treatment. Other sources of bias
arise during the conduct and analysis of a clinical trial. For
example, protocol violations and exclusion of subjects from analysis
based upon knowledge of subject outcomes are possible sources of
bias that may affect the accurate assessment of treatment effect.
Because bias can occur in subtle or unknown ways and its effect is
not measurable directly, it is important to evaluate the robustness
of the results and
[[Page 25714]]
primary conclusions of the trial. Robustness is a concept that
refers to the sensitivity of the overall conclusions to various
limitations of the data, assumptions, and analytic approaches to
data analysis. Robustness implies that, if a variety of analyses of
the data that take into account changing assumptions were to be
performed, the treatment effect and primary conclusions of the trial
would be consistent. The interpretation of statistical measures of
uncertainty of the treatment effect and treatment comparisons should
involve consideration of the potential contribution of bias to the
p-value, confidence interval, or inference.
This guideline largely refers to the use of frequentist methods
when discussing hypothesis testing and/or confidence intervals.
However, the use of Bayesian or other approaches may be considered
when the reasons for their use are clear and when the resulting
conclusions are sufficiently robust compared to alternative
assumptions.
II. Considerations for Overall Clinical Development
2.1 Study Context
2.1.1 Development Plan
The broad aim of the process of clinical development of a new
drug is to find out whether there is a dose range and schedule at
which the drug can be shown to be simultaneously safe and effective,
to the extent that the risk-benefit relationship is acceptable. The
particular subjects who may benefit from the drug and the specific
indications for its use also need to be defined.
Satisfying these broad aims usually requires an ordered program
of clinical trials, each with its own specific objectives. This
should be specified in a clinical plan, or a series of plans, with
appropriate decision points and flexibility to allow modification as
knowledge accumulates. A marketing application should clearly
describe the main content of such plans, and the contribution made
by each trial. Interpretation and assessment of the evidence from
the total program of trials involves synthesis of the evidence from
the individual trials (see section 7.2). This is facilitated by
ensuring that common standards are adopted for a number of features
of the trials, such as dictionaries of medical terms, definition and
timing of the main measurements, handling of protocol deviations,
and so on. A statistical overview or meta-analysis may be
informative when medical questions are addressed in more than one
trial. Where possible, this should be envisaged in the plan so that
the relevant trials are clearly identified and any necessary common
features of their designs are specified in advance. Other major
statistical issues (if any) that are expected to affect a number of
trials in a common plan should be addressed in that plan.
2.1.2 Confirmatory Trial
A confirmatory trial is a controlled trial in which a hypothesis
is stated in advance and evaluated. As a rule, confirmatory trials
are necessary to provide firm evidence of efficacy or safety. In
such trials, the key hypothesis of interest follows directly from
the trial's primary objective, is always predefined, and is the
hypothesis that is subsequently tested when the trial is complete.
In a confirmatory trial, it is equally important to estimate with
due precision the size of the effects attributable to the treatment
of interest and to relate these effects to their clinical
significance.
Confirmatory trials are intended to provide firm evidence in
support of claims. Therefore, adherence to their planned design and
procedures is particularly important; unavoidable changes should be
explained and documented, and their effect examined. A justification
of the design of each such trial and of all other statistical
aspects, such as the planned analysis, should be set out in the
protocol. Each trial should address only a limited number of
questions.
Firm evidence in support of claims requires that the results of
the confirmatory trials demonstrate that the investigational product
under test has clinical benefits. The confirmatory trials should
therefore be sufficient to answer each key clinical question
relevant to the efficacy or safety claim clearly and definitively.
In addition, it is important that the basis for generalization to
the intended patient population is understood and explained; this
may also influence the number and type of centers and/or trials
needed. The results of the confirmatory trial(s) should be robust.
In some circumstances, the weight of evidence from a single
confirmatory trial may be sufficient.
2.1.3 Exploratory Trial
The rationale and design of confirmatory trials nearly always
rests on earlier clinical work carried out in a series of
exploratory studies. Like all clinical trials, these exploratory
studies should have clear and precise objectives. However, in
contrast to confirmatory trials, their objectives may not always
lead to simple tests of predefined hypotheses. In addition,
exploratory trials may sometimes require a more flexible approach to
design so that changes can be made in response to accumulating
results. Their analysis may entail data exploration; tests of
hypothesis may be carried out, but the choice of hypothesis may be
data dependent. Such trials cannot be the basis of the formal proof
of efficacy, although they may contribute to the total body of
relevant evidence.
Any individual trial may have both confirmatory and exploratory
aspects. For example, in most confirmatory trials the data are also
subjected to exploratory analyses which serve as a basis for
explaining or supporting their findings and for suggesting further
hypotheses for later research. The protocol should make a clear
distinction between the aspects of a trial that will be used for
confirmatory proof and the aspects that will provide data for
exploratory analysis.
2.2 Study Scope
2.2.1 Population
In the earlier phases of drug development, the choice of
subjects for a clinical trial may be heavily influenced by the wish
to maximize the chance of observing specific clinical effects of
interest. Hence, they may come from a very narrow subgroup of the
total patient population for which the drug may eventually be
indicated. However, by the time the confirmatory trials are
undertaken, the subjects in the trials should more closely mirror
the intended users. In these trials, it is generally helpful to
relax the inclusion and exclusion criteria as much as possible
within the target indication, while maintaining sufficient
homogeneity to permit a successful trial to be carried out. No
individual clinical trial can be expected to be totally
representative of future users because of the possible influences of
geographical location, the time when it is conducted, the medical
practices of the particular investigator(s) and clinics, and so on.
However, the influence of such factors should be reduced wherever
possible and subsequently discussed during the interpretation of the
trial results.
2.2.2 Primary and Secondary Variables
The primary variable (``target'' variable, primary endpoint)
should be the variable capable of providing the most clinically
relevant and convincing evidence directly related to the primary
objective of the trial. There should generally be only one primary
variable. This will usually be an efficacy variable, because the
primary objective of most confirmatory trials is to provide strong
scientific evidence regarding efficacy. Safety/tolerability may
sometimes be the primary variable, and will always be an important
consideration. Measurements relating to quality of life and health
economics are further potential primary variables. The selection of
the primary variable should reflect the accepted norms and standards
in the relevant field of research. The use of a reliable and
validated variable with which experience has been gained either in
earlier studies or in published literature is recommended. There
should be sufficient evidence that the primary variable can provide
a valid and reliable measure of some clinically relevant and
important treatment benefit in the subject population described by
the inclusion and exclusion criteria. The primary variable should
generally be the one used when estimating the sample size (see
section 3.5).
In many cases, and especially when treatment is directed at a
chronic rather than an acute process, the approach to assessing
subject outcome may not be straightforward and should be carefully
defined. For example, it is inadequate to specify mortality as a
primary variable without further clarification; mortality may be
assessed by comparing proportions alive at fixed points in time, or
by comparing overall distributions of survival times over a
specified interval. Another common example is a recurring outcome.
The measure of treatment effect may again be a simple dichotomous
variable (any occurrence during a specified interval), time to first
occurrence, or rate of occurrence (events per time units of
observation), to give a few possibilities. The assessment of
functional status over time in studying treatment for chronic
disease presents other challenges in selection of the primary
variable. There are many possible
[[Page 25715]]
approaches, such as comparisons of the assessments done at the
beginning and end of the interval of observation, comparison of
slopes calculated from all assessments throughout the interval, or
comparisons of the proportions of subjects exceeding or declining
beyond a prespecified threshold. To avoid multiplicity concerns, it
is critical to specify in the protocol the precise definition of the
primary variable as it will be used in the statistical analysis. In
addition, the clinical relevance of the specific primary variable
selected and the validity of the associated measurement procedures
will generally need to be addressed and justified in the protocol.
The primary variable should be specified in the protocol, along
with the rationale for its selection. Redefinition of the primary
variable after unblinding will almost always be unacceptable, since
the biases this introduces are difficult to assess. When relevant,
the validity and reliability of the primary variable should be
described. Secondary variables are either supportive measurements
related to the primary objective or measurements of effects related
to the secondary objectives. Their predefinition in the protocol is
also important, as well as an explanation of their relative
importance and roles in interpretation of trial results. When the
clinical effect defined by the primary objective is to be measured
in more than one way, the protocol should identify one of the
measurements as the primary variable on the basis of clinical
relevance, importance, objectivity, and/or other relevant
characteristics, whenever such selection is feasible. Another
strategy that may be useful in some situations is to integrate or
combine the multiple measurements into a single or ``composite''
variable, using a predefined algorithm. Indeed, the primary variable
sometimes arises as a combination of multiple clinical measurements
(e.g., the rating scales used in arthritis, psychiatric disorders,
and elsewhere). This approach addresses the multiplicity problem
without requiring adjustment for multiple comparisons. The method of
combining the multiple measurements should be specified in the
protocol, and an interpretation of the resulting scale should be
provided in terms of the size of a clinically relevant benefit. When
composite variables are used as primary variables, the individual
components of these variables are often analyzed separately. When a
rating scale is used as a primary variable, it is especially
important to address factors such as content validity, inter- and
intrarater reliability, and sensitivity for discriminating different
medical conditions.
In some cases, ``global assessment'' variables are developed to
measure the overall safety, overall efficacy, and/or overall
usefulness of a treatment. This type of variable integrates
objective variables and the investigator's overall impression about
the state or change in the state of the subject, and is usually a
scale of ordered categorical ratings. Global assessments of overall
effectiveness are well established in many therapeutic areas,
especially psychotropic drugs and nonsteroidal anti-inflammatory
drugs.
Global assessment variables generally have a subjective
component. When a global assessment scale is used as a primary or
secondary variable, fuller details should be included in the
protocol with respect to:
(1) The relevance of the global scale to the primary objective
of the trial;
(2) The basis for the validity of the scale;
(3) How to utilize the data collected on an individual subject
to assign him/her to a unique category of the global assessment
scale;
(4) How to uniquely categorize subjects with missing data.If
objective variables are considered by the investigator when making a
global assessment, then those objective variables should be
considered additional primary or, at least, important secondary
variables.
Overall usefulness integrates components of both benefit and
risk and reflects the decisionmaking process of the treating
physician, who must weigh benefit and risk in making product use
decisions. A problem with global usefulness scales is that their use
could in some cases lead to the result of two products being
declared equivalent despite having very different profiles of
beneficial and adverse effects. For example, judging the global
usefulness of a treatment as equivalent or superior to an
alternative may mask the fact that it has little or no efficacy but
fewer adverse effects. Therefore, if usefulness is used as a primary
variable, it is important to consider specific efficacy and safety
outcomes separately as additional primary variables.
It may sometimes be desirable to use more than one primary
variable, each of which (or a subset of which) could be a sufficient
basis for marketing approval, to cover the range of effects of the
therapies. The planned manner of interpretation of this type of
evidence should be carefully spelled out. For example, it should be
clear whether an impact on any of the variables, some minimum number
of them, or all of them, would be considered necessary for approval.
The primary hypothesis or hypotheses should be clearly stated with
respect to the primary variables identified and the approach to
testing the hypotheses described. This should include specification
of the statistical parameters being tested (e.g., mean, percentage,
distribution). The effect on the Type I error should be explained
because of the potential for multiple comparison problems (see
section 5.6); the method of controlling Type I error should be given
in the protocol. The extent of intercorrelation among the proposed
primary variables may be considered in evaluating the impact on Type
I error. If the success of the trial depends upon demonstrating
effects on all of the designated primary variables, then there is no
need for adjustment of the Type I error, but the impact on Type II
error and sample size needs should be carefully considered.
When direct assessment of the clinical benefit to the subject
through observing actual clinical efficacy is not practical,
indirect criteria (surrogate variables) may be considered. Commonly
accepted surrogate variables are used in a number of indications
where they are believed to be reliable predictors of clinical
benefit. There are two principal concerns with the introduction of
any proposed surrogate variable. First, it may not be a true
predictor of the clinical outcome of interest. For example, it may
measure treatment activity along one particular pathway, but may not
provide full information on the range of actions and ultimate
effects of the treatment, whether positive or negative. There have
been many instances where treatments showing a highly positive
effect on a proposed surrogate have ultimately been shown to be
detrimental to the subjects' clinical status; conversely, there are
cases of treatments conferring clinical benefit without measurable
impact on proposed surrogates. Additionally, proposed surrogate
variables may not yield a quantitative measure of clinical benefit
that can be weighed directly against adverse effects. Statistical
criteria for validating surrogate variables have been proposed, but
the experience with their use is relatively limited. In practice,
the strength of the evidence for surrogacy depends upon the
biological plausibility of the relationship, the demonstration in
epidemiological studies of the prognostic value of the surrogate for
the clinical outcome, and evidence from clinical trials that
treatment effects on the surrogate correspond to effects on the
clinical outcome. Relationships between clinical and surrogate
variables for one product do not necessarily apply to a product with
a different mode of action for treating the same disease.
Dichotomization or other categorization of continuous or ordinal
variables may sometimes be desirable. Criteria of ``success'' and
``response'' are common examples of dichotomies that should be
specified precisely in terms of, for example, a minimum percentage
improvement (relative to baseline) in a continuous variable or a
ranking categorized as at or above some threshold level (e.g.,
``good'') on an ordinal rating scale. The reduction of diastolic
blood pressure below 90 mmHg is a common dichotomization.
Categorizations are most useful when they have clear clinical
relevance. The criteria for categorization should be predefined and
specified in the protocol, as knowledge of trial results could
easily bias the choice of such criteria. Because categorization
normally implies a loss of information, a consequence will be a loss
of power in the analysis; this should be accounted for in the sample
size calculation.
2.3 Design Techniques to Avoid Bias
The two most important design techniques for avoiding bias in
clinical trials are blinding and randomization, and these should be
a normal feature of most controlled clinical trials intended to be
included in a marketing application. Most such trials follow a
double-blind approach in which treatments are prepacked in
accordance with a suitable randomization schedule and supplied to
the trial center(s) labeled only with the subject number and the
treatment period, so that no one involved in the conduct of the
trial is aware of the specific treatment allocated to any particular
subject, not even as a code letter. This approach will be assumed in
section 2.3.1 and most of section 2.3.2, exceptions being considered
at the end. The protocol should also specify
[[Page 25716]]
procedures aimed at minimizing any anticipated irregularities in
study conduct that might impair a satisfactory analysis, including
various types of protocol violations, withdrawals, and missing
values. The protocol should consider ways both to reduce frequency
of such problems and to handle the problems that do occur in the
analysis of data.
2.3.1 Blinding
Blinding is intended to limit the occurrence of conscious and
unconscious bias in the conduct and interpretation of a clinical
trial arising from the influence that knowledge of treatment may
have on the recruitment and allocation of subjects, their subsequent
care, the attitudes of subjects to the treatments, the assessment of
end points, the handling of withdrawals, the exclusion of data from
analysis, and so on. The essential aim is to prevent identification
of the treatments until all such opportunities for bias have passed.
A double-blind trial is one in which neither the subject nor any
of the investigator or sponsor staff involved in the treatment or
clinical evaluation of the subjects is aware of the treatment
received. This includes anyone determining subject eligibility,
evaluating endpoints, or assessing compliance with the protocol.
This level of blinding is maintained throughout the conduct of the
trial; only when the data are cleaned to an acceptable level of
quality will appropriate personnel be unblinded. If any of the
sponsor staff who are not involved in the treatment or clinical
evaluation of the subjects are required to be unblinded to the
treatment code (e.g., bioanalytical scientists, auditors, those
involved in serious adverse event reporting), the sponsor should
have adequate standard operating procedures (SOP's) to guard against
inappropriate dissemination of treatment codes. In a single-blind
trial the investigator and/or his staff are aware of the treatment
but not the subject. In an open-label trial the identity of
treatment is known to all. The double-blind trial is the optimal
approach. This requires that the treatments to be applied during the
trial cannot be distinguished in any way (appearance, taste, etc.)
either before or during administration, and that the blind is
maintained appropriately during the whole trial.
Difficulties in achieving the double-blind ideal can arise
because: (1) The treatments may be of a completely different nature,
for example, surgery and drug therapy; (2) two drugs may have
different formulations and, although they could be made
indistinguishable by the use of capsules, changing the formulation
might also change the pharmacokinetic and/or pharmacodynamic
properties, so that bioequivalence of the formulations may need to
be established; (3) the daily pattern of administration of two
treatments may differ. One way of achieving double-blind conditions
under these circumstances is to use a ``double dummy'' technique.
This technique may sometimes force an administration scheme that is
sufficiently unusual to influence adversely the motivation and
compliance of the subjects. Ethical difficulties may also interfere
with its use when, for example, it entails dummy operative
procedures. Nevertheless, extensive efforts should be made to
overcome these difficulties.
In some clinical trials, although double blinding is planned, it
may be partially compromised by apparent treatment induced effects.
In such cases, blinding may be improved by blinding investigators to
certain test results (e.g., selected clinical laboratory measures).
Similar approaches (see below) to minimizing bias in open-label
trials should be considered in trials where unique or specific
treatment effects may lead to unblinding individual patients.
If a double-blind trial is not feasible, then the single-blind
option should be considered. In some cases only an open-label trial
is practically or ethically possible. Single-blind and open-label
trials provide additional flexibility, but it is particularly
important that the investigator's knowledge of the next treatment
should not influence the decision to enter the subject; this
decision should precede knowledge of the randomized treatment. Also,
under either of these circumstances, clinical assessments should be
made by medical staff who are not involved in treating the subjects
and who remain blind to treatment. In single-blind or open-label
trials, every effort should be made to minimize the various known
sources of bias and primary variables should be as objective as
possible. The reasons for the degree of blinding adopted, as well as
steps taken to minimize bias by other means, should be explained in
the protocol.
Breaking the blind (for a single subject) should be considered
only when knowledge of the treatment assignment is deemed essential
by the subject's physician for the subject's care. Any intentional
or unintentional breaking of the blind should be reported and
explained at the end of the trial, irrespective of the reason for
its occurrence. The procedure and timing for revealing the treatment
assignments should be documented.
In this document, the blind review of data refers to the
checking of data during the period of time between trial completion
(the last observation on the last subject) and the breaking of the
blind. If specific sponsor staff need to be unblinded during this
period to ensure the integrity of the database or the suitability of
statistical assumptions, appropriate SOP's should be developed to
describe how the treatment code will be protected from broader
dissemination.
2.3.2 Randomization
Randomization introduces a deliberate element of chance into the
assignment of treatments to subjects in a clinical trial. During
subsequent analysis of the trial data, it provides a sound
statistical basis for the quantitative evaluation of the evidence
relating to treatment effects. It also tends to produce treatment
groups in which the distributions of prognostic factors (known and
unknown) are similar. In combination with blinding, randomization
helps to avoid possible bias in the selection and allocation of
subjects arising from the predictability of treatment assignments.
The randomization schedule of a clinical trial documents the
random allocation of treatments to subjects. In the simplest
situation, it is a sequential list of treatments (or treatment
sequences in a crossover trial) or corresponding codes by subject
number. The logistics of some trials, such as those with a screening
phase, may make matters more complicated, but the unique preplanned
assignment of treatment, or treatment sequence, to subject should be
clear. Different trial designs should have different procedures for
generating randomization schedules. The randomization schedule
should be capable of being reproduced (if the need arises). Whenever
possible, this should be accomplished through the use of the same
random number table, or the same computer routine and seed for its
random number generator.
Although unrestricted randomization is an acceptable approach,
some advantages can generally be gained by randomizing subjects in
blocks. This helps to increase the comparability of the treatment
groups particularly when subject characteristics may change over
time, as a result, for example, of changes in recruitment policy. It
also provides a better guarantee that the treatment groups will be
of nearly equal size. In cross-over trials, it provides the means of
obtaining balanced designs with their greater efficiency and easier
interpretation. Care should be taken to choose block lengths that
are sufficiently short to limit possible imbalance, but long enough
to avoid predictability towards the end of the sequence in a block.
Investigators should generally be blind to the block length; the use
of two or more block lengths, randomly selected for each block, can
achieve the same purpose. (Theoretically, in a double-blind trial
predictability does not matter, but the pharmacological effects of
drugs often provide the opportunity for intelligent guesswork.)
In multicenter trials, the randomization procedures should be
organized centrally. It is advisable to have a separate random
scheme for each center, i.e., to stratify by center or to allocate
several whole blocks to each center. More generally, stratification
by important prognostic factors measured at baseline (e.g., severity
of disease, age, sex, etc.) may sometimes be valuable in order to
promote balanced allocation within strata; this has greater
potential benefit in small trials. The use of more than two or three
stratification factors is rarely necessary, is less successful at
achieving balance, and is logistically troublesome. Where it is
necessary, the use of a dynamic allocation procedure (see below) may
help to achieve balance across all factors simultaneously, provided
the rest of the trial procedures can be adjusted to accommodate an
approach of this type.
The next subject to be randomized into a study should always
receive the treatment corresponding to the next free number in the
appropriate randomization schedule (in the respective stratum, if
randomization is stratified). The appropriate number and associated
treatment for the next subject should only be allocated when entry
of that subject to the randomized part of the trial has been
confirmed. These tasks will normally be carried out by staff at the
investigator's center, who will then dispense the relevant blinded
trial supplies. Details of the
[[Page 25717]]
randomization which facilitate predictability (e.g., block length)
should not be contained in the study protocol. The randomization
schedule itself should be filed securely by the sponsor or an
independent party in a manner that ensures that blindness is
properly maintained throughout the trial. Access to the
randomization schedule during the trial should take into account the
possibility that, in an emergency, the blind may have to be broken
for any subject, either partially or completely. The procedure to be
followed, the necessary documentation, and the subsequent treatment
and assessment of the subject should all be described in the
protocol.
Dynamic allocation is an alternative randomization procedure in
which the allocation of treatment to a subject is influenced by the
current balance of allocated treatments and, in a stratified trial,
by the stratum to which the subject belongs and the balance within
that stratum. Every effort should be made to retain the double-blind
status of the trial. For example, knowledge of the treatment code
may be restricted to a central trial office from where the dynamic
allocation is controlled, generally through telephone contact. This
in turn permits additional checks of eligibility criteria and
establishes entry into the trial, features that can be valuable in
certain types of multicenter trials. The usual system of prepacking
and labeling drug supplies for double-blind trials can then be
followed, but the order of their use is no longer sequential. It is
desirable to use appropriate computer algorithms to keep personnel
at the central trial office blind to the treatment code. The
complexity of the logistics and potential impact on the analysis
should be carefully evaluated when considering dynamic allocation.
III. Study Design Considerations
3.1 Study Configuration
3.1.1 Parallel Group Design
The most common clinical trial design for confirmatory trials is
the parallel group design in which subjects are randomized to one of
two or more arms, each arm being allocated a different treatment.
These treatments will include the investigational product at one or
more doses, and one or more control treatments, such as placebo and/
or an active comparator. The assumptions underlying this design are
less complex than for most other designs. However, there may be
additional features of the design which complicate the analysis and
interpretation (e.g., covariates, repeated measurements over time,
interactions between design factors, protocol violations, dropouts,
and withdrawals).
3.1.2 Cross-Over Design
In the cross-over design, each subject is randomized to a
sequence of two or more treatments and hence acts as his own control
for treatment comparisons. This simple maneuver is attractive
primarily because it reduces the number of subjects and, usually,
the number of assessments needed to achieve a specific power,
sometimes to a marked extent. In the simplest 2x2 cross-over design,
each subject receives each of two treatments in randomized order in
two successive treatment periods, often separated by a washout
period. The most common extension of this entails comparing n(>2)
treatments in n periods, each subject receiving all n treatments.
Numerous variations exist, such as designs in which each subject
receives a subset of n(>2) treatments, or designs in which
treatments are repeated within a subject.
Cross-over designs have a number of problems which can
invalidate their results. The chief difficulty concerns carryover,
that is, the residual influence of treatments in subsequent
treatment periods. In an additive model, the effect of unequal
carryover will be to bias direct treatment comparisons. In the 2x2
design, the relevant contrast cannot be statistically distinguished
from the interaction between treatment and period, and the test for
either of these lacks power because it is a ``between subject''
contrast. This problem is less acute in higher order designs, but
cannot be entirely dismissed.
Therefore, when the cross-over design is used, it is important
to avoid carryover. This is best done by selective and careful use
of the design on the basis of adequate knowledge of both the disease
area and the new medication. The disease under study should be
chronic and stable. The relevant effects of the medication should
develop fully within the treatment period. The washout periods
should be sufficiently long for complete reversibility of drug
effect. The fact that these conditions are likely to be met should
be established in advance of the trial by means of prior information
and data.
A common, and generally satisfactory, use of the 2x2 cross-over
design is to demonstrate the bioequivalence of two formulations of
the same medication. In this particular application in healthy
volunteers, carryover effects on the relevant pharmacokinetic
variable are rather unlikely to occur if the wash-out time between
the two periods is sufficiently long. However, it is still important
to check this assumption during analysis on the basis of the data
obtained, for example, by demonstrating that no drug is detectable
at the start of each period.
There are additional problems that need careful attention in
cross-over trials. The most notable of these are the complications
of analysis and interpretation arising from the loss of subjects.
Also, the potential for carryover leads to difficulties in assigning
adverse events that occur in later treatment periods to the
appropriate treatment. These and other issues are described in the
ICH E4 topic on ``Dose-Response Information to Support Drug
Registration.'' The cross-over design should generally be restricted
to situations where losses of subjects from the trial are expected
to be small.
3.1.3 Factorial Designs
In a factorial design, two or more treatments are evaluated
simultaneously in the same set of subjects through the use of
varying combinations of the treatments. The simplest example is the
2x2 factorial design in which subjects are randomly allocated to one
of the four possible combinations of two treatments, A and B. These
are: A alone; B alone; both A and B; neither A nor B. In many cases
this design is used for the specific purpose of examining the
interaction of A and B. The statistical test of interaction is model
dependent and may lack power to detect an interaction if the sample
size was calculated based on the test for main effects. This
consideration is important when this design is used for examining
the joint effects of A and B, in particular, if the treatments are
likely to be used together.
Another important use of the factorial design is to establish
the dose-response characteristics of a combination product, e.g.,
one combining treatments C and D, especially when the efficacy of
each monotherapy has been established at some dose in prior studies.
A number, m, of doses of C is selected, usually including a zero
dose (placebo), and a similar number, n, of doses of D. The full
design then consists of mn treatment groups, each receiving a
different combination of doses of C and D. The resulting estimate of
the response surface may then be used to help identify an
appropriate combination of doses of C and D for clinical use.
In some cases, the 2x2 design may be used to make efficient use
of clinical trial subjects by evaluating the efficacy of the two
treatments with the same number of subjects as would be required to
evaluate the efficacy of either one alone. This strategy has proved
to be particularly valuable for very large mortality studies. The
efficiency of this approach depends upon the absence of interaction
between treatments A and B so that the effects of A and B on the
primary efficacy variables follow an additive model, hence the
effect of A is virtually identical whether or not it is additional
to the effect of B. As for the cross-over trial, evidence that this
condition is likely to be met should be established in advance of
the trial by means of prior information and data.
3.2 Multicenter Trials
Multicenter trials are carried out for two main reasons. First,
a multicenter trial is an accepted way of evaluating a new
medication more efficiently; under some circumstances, it may
present the only practical means of accruing sufficient subjects to
satisfy the trial objective within a reasonable timeframe.
Multicenter trials of this nature may, in principle, be carried out
at any stage of clinical development. They may have several centers
with a large number of subjects per center or, in the case of a rare
disease, they may have a large number of centers with very few
subjects per center.
Second, a trial may be designed as a multicenter (and multi-
investigator) trial primarily to provide a better basis for the
subsequent generalization of its findings. This arises from the
possibility of recruiting the subjects from a wider population and
of administering the medication in a broader range of clinical
settings, thus presenting an experimental situation that is more
typical of future use. In this case, the involvement of a number of
investigators also gives the potential for a wider range of clinical
judgement concerning the value of the medication. Such a trial would
be a confirmatory trial in the later phases of drug development and
would be likely to involve a large number of investigators and
centers.
[[Page 25718]]
It might sometimes be conducted in a number of different countries
to facilitate generalizability even further.
If a multicenter trial is to be meaningfully interpreted and
extrapolated, then the manner in which the protocol is implemented
should be clear and similar at all centers. Furthermore, the usual
sample size and power calculations depend upon the assumption that
the differences between the compared treatments in the centers are
unbiased estimates of the same quantity. It is important to design
the common protocol and to conduct the trial with this background in
mind. Procedures should be standardized as completely as possible.
Variation of evaluation criteria and schemes can be reduced by
investigator meetings, by the training of personnel in advance of
the study, and by careful monitoring during the study. Good design
should generally aim to achieve the same distribution of subjects to
treatments within each center and good management should maintain
this design objective. Trials which avoid excessive variation in the
numbers of subjects per center and trials which avoid a few very
small centers have advantages if it is later found necessary to
examine the heterogeneity of the treatment effect from center to
center, because they reduce the differences between different
weighted estimates of the treatment effect. (This point does not
apply to trials in which all centers are very small and in which
center does not feature in the analysis.) Failure to take these
precautions, combined with doubts about the homogeneity of the
results, may, in severe cases, reduce the value of a multicenter
trial to such a degree that it cannot be regarded as giving
convincing evidence for the sponsor's claims.
In the simplest multicenter trial, each investigator will be
responsible for the subjects recruited at one hospital, so that
``center'' is identified uniquely by either investigator or
hospital. In many trials, however, the situation is more complex.
One investigator may recruit subjects from several hospitals; one
investigator may represent a team of clinicians (subinvestigators)
who all recruit subjects from their own clinics at one hospital or
at several associated hospitals. Whenever there is room for doubt
about the definition of center in a statistical model, the
statistical section of the protocol (see section 5.1) should clearly
define the term (e.g., by investigator, location, or region) in the
context of the particular trial. In most instances, centers can be
satisfactorily defined through the investigators. (ICH Guideline E6
provides relevant guidance in this respect.) In cases of doubt, the
aim should be to define centers to achieve homogeneity in the
important factors affecting the measurements of the primary
variables and the influence of the treatments. Any rules for
combining centers in the analysis should be justified and specified
prospectively in the protocol where possible, but in any case
decisions concerning this approach should always be taken blind to
treatment, for example, at the time of the blind review. It is
sometimes possible to characterize the centers by historical
measures of response to the control treatment or to other standard
treatments, and this information may help to support decisions
concerning the combination of centers for analysis.
The statistical model to be adopted for the comparison of
treatments should be described in the protocol. The main treatment
effect may be investigated first using a model that allows for
center differences, but does not include a term for center by
treatment interaction. In the absence of a true center by treatment
interaction, the routine inclusion of interaction terms in the model
reduces the efficiency of the test for the main effects. In the
presence of a true center by treatment interaction, the
interpretation of the main treatment effect is controversial.
In some studies, for example, some large mortality studies with
very few subjects per center, there may be no reason to expect the
centers to have any influence on the primary or secondary variables
because they are unlikely to represent influences of clinical
importance. In other studies, it may be recognized from the start
that the limited numbers of subjects per center will make it
impracticable to include the center effects in the statistical
model. In these cases, it is not appropriate to include a term for
center in the model, because in this situation randomization is
rarely stratified by center.
If positive treatment effects are found in a trial with
appreciable numbers of subjects per center, there should generally
be a subsequent exploration of treatment by center interaction, as
this may affect the generalizability of the conclusions. Marked
treatment by center interaction may be identified by graphical
display of the results of individual centers or by analytical
methods, such as a significance test of the interaction. When using
such a statistical significance test, it is important to recognize
that this generally has low power in a trial designed to detect the
main effect of treatment.
If a treatment by center interaction is found, this should be
interpreted with care and vigorous attempts should be made to find
an explanation in terms of other features of trial management or
subject characteristics. Such an explanation will usually define the
appropriate further analysis and interpretation. In the absence of
an explanation, marked quantitative interactions imply that
alternative estimates of the treatment effect may be needed, giving
different weights to the centers, in order to substantiate the
robustness of the estimates of treatment effect. It is even more
important to understand the basis of any marked qualitative
interactions, and failure to find an explanation may necessitate
further clinical trials before the treatment effect can be reliably
predicted.
3.3 Type of Comparison
3.3.1 Trials to Show Superiority
Scientifically, efficacy is most convincingly established by
demonstrating superiority to placebo in a placebo-controlled trial,
by showing superiority to an active control treatment, or by
demonstrating a dose-response relationship. This type of trial is
referred to as a ``superiority'' trial (see section 5.2.3). In this
guideline, superiority trials are generally assumed unless
explicitly stated otherwise.
For serious illnesses, when a therapeutic treatment that has
been shown to be efficacious by superiority trial(s) exists, a
placebo-controlled trial may be considered unethical. In that case,
the scientifically sound use of the active control should be
considered. The appropriateness of placebo control versus active
control should be considered on a study-by-study basis.
3.3.2 Trials to Show Equivalence or Noninferiority
In some cases, an investigational product is compared to a
reference treatment without the objective of showing superiority.
This type of trial is divided into two major categories according to
its objective; one is an ``equivalence'' trial and the other is a
``noninferiority'' trial.
Bioequivalence trials fall into the former category. In some
situations, clinical equivalence trials are also undertaken for
other regulatory reasons, such as demonstrating the clinical
equivalence of a generic product to the marketed product when the
compound is not absorbed and therefore not present in the blood
stream.
Many active control trials are designed to show that the
efficacy of an investigational product is no worse than that of the
active comparator, and hence fall into the latter category. Another
possibility is a ``relative potency assay,'' which is a study where
multiple doses of the investigational drug are compared with the
recommended dose or multiple doses of the standard drug.
Active control equivalence or noninferiority trials may also
incorporate a placebo, thus pursuing multiple goals in one trial,
for example, establishing superiority to placebo, thereby validating
the study design and evaluating the degree of similarity of efficacy
and safety to the active comparator. There are well-known
limitations associated with the use of the active control
equivalence (or noninferiority) trials that do not incorporate a
placebo. These relate to the implicit lack of any measure of
internal validity (in contrast to superiority trials), thus making
external validation necessary. The equivalence (or noninferiority)
trial is not conservative in nature, so many flaws in the design or
conduct of the trial will tend to bias the results towards a
conclusion of equivalence. For these reasons, the design features of
such trials should receive special attention.
Active comparators should be chosen with care. An example of a
suitable active comparator would be a widely used therapy whose
efficacy in the relevant indication has been clearly established and
quantified in well-designed and well-documented superiority trial(s)
and that can be reliably expected to exhibit similar efficacy in the
contemplated active control study. To this end, the new trial should
have the same important design features (primary variables, the dose
of the active comparator, eligibility criteria, etc.) as the
previously conducted superiority trials in which the active
comparator clearly demonstrated clinically relevant efficacy.
It is vital that the protocol of a trial designed to demonstrate
equivalence or
[[Page 25719]]
noninferiority contain a clear statement that this is its explicit
intention. An equivalence margin should be specified in the
protocol; this margin is the largest difference which can be judged
as being clinically acceptable. For the active control equivalence
trial, both the upper and the lower equivalence margins are needed,
while for the active control non-inferiority trial, only the lower
margin is needed. There should be clinical justification for the
choice of equivalence margins.
Statistical analysis is generally based on the use of confidence
intervals (see section 5.5). For equivalence trials, the two-sided
1-2 (alpha) confidence limits should be used. Equivalence
is inferred when the entire confidence interval falls within the
equivalence margins. This is equivalent to the method of using two
simultaneous one-sided tests to test the (composite) null hypothesis
that the treatment difference is outside of the equivalence margins
versus the (composite) alternative that the treatment difference is
within the limits. With this method, the Type I error is controlled
at a level of . For noninferiority trials, the one-sided 1-
interval should be used. The confidence interval approach
has a one-sided hypothesis test counterpart testing the null
hypothesis that the treatment difference (investigational product
minus control) is equal to the lower equivalence margin versus the
alternative that the treatment difference is greater than the lower
equivalence margin. Sample size calculations should be based on
these methods (see section 3.5). The choice of should be a
consideration separate from the choice of a one-sided or two-sided
test.
It is inappropriate to conclude equivalence or noninferiority
based on observing a nonsignificant test result of the null
hypothesis that there is no difference between the investigational
product and the active comparator.
There are also special issues in the choice of analysis sets.
Subjects who withdraw or drop out of the treatment group or the
comparator group will tend to have a lack of response, hence the
analysis of all randomized subjects may be biased toward
demonstrating equivalence (see section 5.2.3).
3.3.3 Dose-Response Designs
How response is related to the dose of a new investigational
product is a question to which answers may be obtained in all phases
of development and by a variety of approaches (see ICH E4). Dose-
response studies may serve a number of objectives, among which the
following are of particular importance: The confirmation of
efficacy; the investigation of the shape and location of the dose-
response curve; the estimation of an appropriate starting dose; the
identification of optimal strategies for individual dose
adjustments; the determination of a maximal dose beyond which
additional benefit would be unlikely to occur. These objectives
should be addressed using the data collected at a number of doses
under investigation, including a placebo (zero dose) wherever
appropriate. For this purpose, the application of estimation
procedures, including the construction of confidence intervals and
of graphical methods is as important as the use of statistical
tests. The hypothesis tests that are used may need to be tailored to
the natural ordering of doses or to particular questions regarding
the shape of the dose-response curve (e.g., monotonicity). The
details of the planned statistical procedures should be given in the
protocol.
3.4 Group Sequential Designs
Group sequential designs are used to facilitate the conduct of
interim analysis (see section 4.5). While group sequential designs
are not the only acceptable types of designs permitting interim
analysis, they are the most commonly applied because it is more
practicable to assess grouped subject outcomes at periodic intervals
during the trial than on a continuous basis as data from each
subject become available. The statistical methods should be fully
specified in advance of the availability of information on treatment
outcomes and subject treatment assignments (i.e., blind breaking,
see section 4.5). An independent data monitoring committee (IDMC)
may be used to conduct the interim analysis of data arising from a
group sequential design (see section 4.6). While the design has been
most widely and successfully used in large, long-term trials of
mortality or major nonfatal endpoints, its use is growing in other
circumstances. In particular, it is recognized that safety must be
monitored in all trials, therefore, the need for formal procedures
to cover early stopping for safety reasons should always be
considered.
3.5 Sample Size
The number of subjects in a clinical trial should always be
large enough to provide a reliable answer to the questions
addressed. This number is usually determined by the primary
objective of the trial. If the sample size is determined on some
other basis, this should be made clear and justified. For example, a
trial sized on the basis of safety questions or requirements may
need larger numbers of subjects than one sized on the basis of
efficacy questions. (See, for example, ICH E1A ``Population
Exposure: The Extent of Population Exposure to Assess Clinical
Safety.'')
When determining the appropriate sample size, the following
items should be specified: A primary variable; the test statistic;
the null hypothesis; the alternative (``working'') hypothesis at the
chosen dose(s) (embodying consideration of the treatment difference
to be detected or rejected at the dose and in the subject population
selected); the probability of erroneously rejecting the null
hypothesis (the Type I error) and the probability of erroneously
failing to reject the null hypothesis (the Type II error); as well
as the approach to dealing with treatment withdrawals and protocol
violations. In some instances, the event rate is of primary interest
for evaluating power, and assumptions should be made to extrapolate
from the required number of events to the eventual sample size for
the trial.
The method by which the sample size is calculated should be
given in the protocol, together with the estimates of any quantities
used in the calculations (such as variances, mean values, response
rates, event rates, difference to be detected). The basis of these
estimates should also be given. It is important to investigate the
sensitivity of the sample size estimate to a variety of deviations
from these assumptions and this may be facilitated by providing a
range of sample sizes appropriate for a reasonable range of
deviations from assumptions. In confirmatory studies, assumptions
should normally be based on published data or on the results of
earlier studies. The treatment difference to be detected may be
based on a judgement concerning the minimal effect that has clinical
relevance in the management of patients or on a judgement concerning
the anticipated effect of the new treatment, where this is larger.
Conventionally, the probability of Type I error is set at 5 percent
or less or as dictated by any adjustments made necessary for
multiplicity considerations; the precise choice is influenced by the
prior plausibility of the hypothesis under test and the desired
impact of the results. The probability of Type II error is
conventionally set at 20 percent or less; it is in the sponsor's
interest to keep this figure as low as feasible, especially in the
case of studies that are difficult or impossible to repeat.
Sample size calculations should refer to the number of subjects
required for the primary analysis. If this is the ``all randomized
subjects'' set, estimates about the effect size may need to be
reduced compared to the per protocol set. This is due to the
diluting effect of the inclusion of treatment withdrawals. The
assumptions of variability may also need to be revised.
The sample size of an equivalence trial or a noninferiority
trial (see section 3.3.2) should normally be based on the objective
of obtaining a confidence interval for the treatment difference that
shows that the treatments differ at most by a clinically acceptable
difference. For equivalence trials, the power is usually assessed at
a true difference of zero but can be underestimated if the true
difference is not zero. For noninferiority trials, the power is
usually assessed at an expected (nonzero) difference, but can be
underestimated if the true difference is less than expected. The
choice of a ``clinically acceptable'' difference needs
justification, and may be smaller than the ``clinically relevant''
difference referred to above in the context of superiority trials
designed to establish that a difference exists.
The sample size in a group sequential trial cannot be fixed in
advance because it depends upon the play of chance in combination
with the chosen stopping rule and the true treatment difference. The
design of the stopping rule should take into account the consequent
distribution of the sample size, usually embodied in the expected
and maximum sample sizes.
When event rates are lower than anticipated or variability is
larger than expected, methods for sample size reestimation are
available without unblinding data or making treatment comparisons
(see section 4.4).
3.6 Data Capture and Processing
The collection of data and transfer of data from the
investigator to the sponsor can take place through a variety of
media, including paper case record forms, remote site
[[Page 25720]]
monitoring systems, medical computer systems, and electronic
transfer. Whatever data capture instrument is used, the form and
content of the information collected should be in full accordance
with the protocol and should be established in advance of the
conduct of the clinical trial. It should focus on the data necessary
to implement the analysis plan, including the context information
(such as timing assessments relative to dosing) necessary to confirm
protocol compliance or identify important protocol deviations.
``Missing values'' should be distinguishable from the ``value zero''
or ``characteristic absent.''
The process of data capture, through to database finalization,
should be carried out in accordance with good clinical practice
(GCP) (see ICH E6, section 5). Specifically, timely and reliable
processes for recording data and rectifying errors and omissions are
necessary to ensure delivery of a quality database and the
achievement of the trial objectives through the implementation of
the analysis plan.
IV. Study Conduct
4.1 Trial Monitoring
Careful conduct of a clinical trial according to the protocol
has a major impact on the credibility of the results. Careful
monitoring can ensure that difficulties are noticed early and their
occurrence or recurrence minimized.
There are two distinct types of monitoring that generally
characterize confirmatory clinical trials sponsored by the
pharmaceutical industry. Both types of trial monitoring, in addition
to entailing different staff responsibilities, involve access to
different types of study data and information, thus different
principles apply for the control of potential statistical and
operational bias.
One type of monitoring concerns the oversight of the quality of
the trial, including whether the protocol is being followed,
acceptability of data being accrued, success of planned accrual
targets, checking the design assumptions, etc. (see sections 4.2 to
4.4). This type of monitoring does not require access to information
on comparative treatment effects, nor unblinding of data, and
therefore has no impact on Type I error. The monitoring of a trial
for this purpose is the responsibility of the sponsor and can be
carried out by the sponsor or an independent group selected by the
sponsor. The period for this type of monitoring usually starts with
the selection of the study sites and ends with the collection and
cleaning of the last subject's data.
The other type of trial monitoring involves breaking the blind
to make treatment comparisons. It therefore involves the accruing of
comparative treatment results, which requires that a protocol (or
appropriate amendments prior to a first analysis) contain
statistical plans to prevent certain types of bias. This type of
trial monitoring involves unblinded (i.e., key breaking) access to
treatment group assignment (actual treatment assignment or
identification of group assignment) and comparative treatment group
summary information. This type of monitoring is discussed in
sections 4.5 and 4.6.
4.2 Changes in Inclusion and Exclusion Criteria
Inclusion and exclusion criteria should remain constant, as
specified in the protocol, throughout the period of subject
recruitment. Occasionally, however, changes may be appropriate; in
long-term studies, for example, growing medical knowledge either
from outside the trial or from interim analyses may suggest a change
of entry criteria. Changes may also result from the discovery by
monitoring staff that regular violations of the entry criteria are
occurring, or that seriously low recruitment rates are due to over-
restrictive criteria. Changes should be made without breaking the
blind and should always be described by a protocol amendment that
should cover any statistical consequences, such as sample size
adjustments arising from different event rates, or modifications to
the analysis plan, such as stratifying the analysis according to
modified inclusion/exclusion criteria.
4.3 Accrual Rates
In studies with a long time-scale for the accrual of subjects,
the rate of accrual should be monitored; if it falls appreciably
below the projected level, the reasons should be identified and
remedial actions taken to protect the power of the trial and allay
concerns about selective entry and other aspects of quality. In a
multicenter trial, these considerations apply to the individual
centers.
4.4 Sample Size Adjustment
In long-term trials, there will usually be an opportunity to
check the assumptions which underlie the original design and sample
size calculations. This may be particularly important if the trial
specifications have been made on preliminary and/or uncertain
information. An interim check conducted on the blinded data may
reveal that overall response variances, event rates, or survival
experience are not as anticipated. A revised sample size may then be
calculated using suitably modified assumptions, and should be
justified and documented in a protocol amendment and in the final
report. The steps taken to preserve blindness and the consequences,
if any, for the Type I error and the width of confidence intervals
should be explained. The potential need for reestimation of the
sample size should be envisaged in the protocol whenever possible
(see section 3.5).
4.5 Interim Analysis and Early Stopping
Any analysis intended to compare treatment arms with respect to
efficacy or safety at any time prior to formal completion of a trial
is an interim analysis. Because the number, methods, and
consequences of these comparisons affect the interpretation of the
trial, all interim analyses should be carefully planned in advance
and described in the protocol, or otherwise specified in amendments
prior to unblinded access to treatment comparison data. When an
interim analysis is planned with the intention of deciding whether
or not to terminate a trial, this is usually accomplished by the use
of a group sequential design that employs statistical monitoring
schemes as guidelines (see section 3.4). The goal of such an interim
analysis is to stop the trial early if the superiority of the
treatment under study is clearly established, if the demonstration
of a relevant treatment difference has become unlikely, or if
unacceptable adverse effects are apparent. Generally, boundaries for
monitoring efficacy require more evidence to terminate a trial early
(i.e., more conservative) than do boundaries to terminate a trial
for safety reasons. When the trial design and monitoring objective
involve multiple endpoints, then this aspect of multiplicity may
also need to be taken into account.
The schedule of interim analyses, or at least the considerations
which will govern its generation, should be stated in the protocol
or a protocol amendment before the time of the first interim
analysis; as flexible statistical methods are available to conduct
interim analyses according to a variety of needs (early or late in a
trial), the stopping guidelines and their properties should be
clearly stated in the protocol or amendments. This material should
be written or approved by the data monitoring committee, when the
study has one (see section 4.6). Deviations from the planned
procedure always bear the potential of invalidating the study
results. If it becomes necessary to make changes to the trial, any
consequent changes to the statistical procedures should be specified
in an amendment to the protocol at the earliest opportunity,
especially discussing the impact on any analysis and inferences that
such changes may cause. The procedures selected should always ensure
that the overall probability of Type I error is controlled.
The execution of an interim analysis should be a completely
confidential process because unblinded data and results are
potentially involved. All staff involved in the conduct of the trial
should remain blind to the results of such analyses because of the
possibility that their attitudes to the trial will be modified and
cause changes in recruitment patterns or biases in treatment
comparisons. This principle applies to the investigators and their
staff and to staff employed by the sponsor that come into contact
with clinic staff or subjects. Investigators should be informed only
about the decision to continue or to discontinue the trial, or to
implement modifications to trial procedures.
Most clinical trials intended to support the efficacy and safety
of an investigational product should proceed to full completion of
planned sample size accrual; trials should be stopped early only for
ethical reasons or if the power is no longer acceptable. However, it
is recognized that drug development plans involve the need for
sponsor access to comparative treatment data for a variety of
reasons, such as planning other studies or when only a subset of
trials will involve the study of serious life-threatening outcomes
or mortality which may need sequential monitoring of accruing
comparative treatment effects for ethical reasons. In either of
these situations, plans for interim statistical analysis should be
in place in the protocol or in protocol amendments prior to the
unblinded access to comparative treatment data in order to deal with
the
[[Page 25721]]
potential statistical and operational bias that may be introduced.
For many clinical trials of investigational products, especially
those that have major public health significance, the responsibility
for monitoring comparisons of efficacy and/or safety outcomes should
be assigned to an external, independent group, often called an
independent data monitoring committee (IDMC), a data and safety
monitoring board, or a data monitoring committee, whose
responsibilities should be clearly described.
When a sponsor assumes the role of monitoring efficacy or safety
comparisons and therefore has access to unblinded comparative
information, particular care should be taken to protect the
integrity of the trial and the sharing of information. The sponsor
should ensure and document that the internal monitoring committee
has complied with written SOP's and that minutes of decisionmaking
meetings are maintained.
Any interim analysis that is not planned in the protocol or
specified in an amendment to the protocol prior to unblinding the
data (with or without the consequences of stopping the trial early)
may flaw the results of a trial and possibly weaken confidence in
the conclusions drawn. Therefore, such analyses should be avoided.
If unplanned interim analysis is conducted, the study report should
explain why it was necessary and the degree to which blindness had
to be broken, and provide an assessment of the potential magnitude
of bias introduced and the impact on the interpretation of the
results.
4.6 Role of Independent Data Monitoring Committee (IDMC)
(see sections 1.25 and 5.5.2 of ICH Guideline E6)
An IDMC may be established by the sponsor to assess at intervals
the progress of a clinical trial, safety data, and critical efficacy
variables and recommend to the sponsor whether to continue, modify,
or terminate a trial. The IDMC should have written operating
procedures and maintain records of its meetings. The independence of
the IDMC is intended to control the sharing of important comparative
information and to protect the integrity of the clinical trial from
adverse impact resulting from access to trial information. The IDMC
is a separate entity from an institutional review board (IRB) or an
ethics board, and its composition should include clinical trial
scientists knowledgeable in the appropriate disciplines, including
statistics.
When there are sponsor representatives on the IDMC, their role
should be clearly defined in the operating procedures of the
committee (for example, covering whether or not they can vote on key
issues). Since these sponsor staff would have access to unblinded
information, the procedures should also address the control of
dissemination of interim trial results within the sponsor
organization.
V. Data Analysis
5.1 Prespecified Analysis Plan
When designing a clinical trial, the principal features of the
eventual statistical analysis of the data should be described in the
statistical section of the protocol. This section should include all
features of the proposed confirmatory analysis of the primary
variable(s) and the way in which anticipated analysis problems will
be handled. In the case of exploratory trials, this section could
describe more general principles and directions.
Subsequently, a statistical analysis plan may be written as a
separate document. In this document, a more technical and detailed
elaboration of the principal features stated in the protocol may be
included. The statistical analysis plan is usually an internal
document and may include detailed procedures for executing the
statistical analysis. The statistical analysis plan should be
reviewed and possibly updated as a result of the blind review of the
data (see section 7.1 for definition).
If the blind review suggests changes to the principal features
stated in the protocol, these should be documented in a protocol
amendment. Otherwise, it will suffice to update the statistical
analysis plan with the considerations suggested from the blind
review. Only results from analyses envisaged in the protocol
(including amendments) can be regarded as confirmatory.
The statistical methodology, including when in the clinical
trial process methodology decisions were made, should be clearly
described in the statistical section of the clinical study report
(see ICH E3).
5.2 Analysis Sets
The set of subjects whose data are to be included in the main
analyses should be defined in the statistical section of the
protocol. In addition, documentation for all subjects for whom study
procedures (e.g., run-in period) were initiated may be useful. The
content of this subject documentation depends on detailed features
of the particular trial, but at least demographic and baseline data
on disease status should be collected whenever possible.
If all subjects randomized into a clinical trial satisfied all
entry criteria, followed all trial procedures perfectly with no
losses to followup, and provided complete data records, then the set
of subjects to be included in the analysis would be self-evident.
The design and conduct of a trial should aim to approach this ideal
as closely as possible, but, in practice, it is doubtful if it can
ever be fully achieved. Hence, the statistical section of the
protocol should address any anticipated problems prospectively in
terms of how these affect the subjects and data to be analyzed. The
protocol should also specify procedures aimed at minimizing any
anticipated irregularities in study conduct that might impair a
satisfactory analysis, including various types of protocol
violations, withdrawals, and missing values. The protocol should
consider ways both to reduce the frequency of such problems and to
handle the problems that occur in the analysis of data. The blind
review of data to identify possible amendments to the analysis plan
due to the protocol violations should be carried out before
unblinding. It is desirable to identify any important protocol
violation with respect to the time when it occurred, its cause, and
its influence on the trial result. The frequency and type of
protocol violations, missing values, and other problems should be
documented in the study report and their potential influence on the
trial results should be described (see ICH E3).
Decisions concerning the analysis set should be guided by the
following principles: (1) To minimize bias and (2) to avoid
inflation of Type I error.
5.2.1 All Randomized Subjects
The intention-to-treat principle implies that the primary
analysis should include all randomized subjects. In practice, this
ideal may be difficult to achieve, for reasons to be described.
Hence, analysis sets referred to as ``all randomized subjects'' may
not, in fact, include every subject. For example, it is common
practice to exclude from the all randomized set any subject who
failed to take at least one dose of trial medication or any subject
without data post randomization. No analysis is complete unless the
potential biases arising from these exclusions are addressed and can
be reasonably dismissed.
In many clinical trials, the ``all randomized subjects''
approach is conservative and also gives estimates of treatment
effects that are more likely to mirror those observed in subsequent
practice. Randomization prevents biased allocation of subjects to
treatments and provides the foundation of statistical tests. The
problems associated with the analysis of all randomized subjects lie
in the handling of protocol violations and the subtleties that this
can involve.
There are two types of major protocol violations. One is
violation of entry criteria. The second is violation of the protocol
after randomization. Subjects who fail to satisfy an objective entry
criterion measured prior to randomization, but who enter the trial,
may be excluded from analysis without introducing bias into the
treatment comparison, assuming all subjects receive equal scrutiny
for eligibility violations. (This may be difficult to ensure if the
data are unblinded.) Not all entry criteria are sufficiently
objective for this to be done satisfactorily. Reasons for excluding
subjects from the analysis of all randomized subjects should be
justified.
Other problems occur after randomization (error in treatment
assignment, use of excluded medications, poor compliance, loss to
followup, missing data, and other protocol violations). These
problems are especially difficult when their occurrence is related
to treatment assignment. It is good practice to assess the pattern
of such problems with respect to frequency and time to occurrence
among treatment groups. Subjects withdrawn from treatment may
introduce serious bias and, if they have provided no data after
withdrawal, there is no obvious solution. Severe protocol violation,
such as use of excluded medication, may also introduce serious bias
into measurements after such a violation. The necessary inclusion of
such subjects in the analysis may require some redefinition of the
primary variable or some assumptions about the subjects' outcomes.
Measurements of primary variables made at the time of the loss
to followup of a subject for any reason or at the time of a severe
[[Page 25722]]
protocol violation, or subsequently collected in accordance with the
protocol, are valuable in the context of all randomized subjects
analysis. Their use in analysis should be described and justified in
the statistical section of the protocol and their collection
described elsewhere in the protocol. However, the use of imputation
techniques can lead to biased estimates of treatment effects,
particularly when the likelihood of the loss of a subject is related
to treatment or response. Any other methods to be employed to ensure
the availability of measurements of primary variables for every
subject in the all randomized subjects analysis should be described.
Because of the unpredictability of some problems, it may
sometimes be preferable to defer detailed consideration of the
manner of dealing with irregularities until the blind review of the
data at the end of the study and, if so, this should be stated in
the protocol.
5.2.2 Per Protocol Subjects
The ``per protocol'' set of subjects, sometimes described as the
``valid cases,'' the ``efficacy'' sample, or the ``evaluable
subjects'' sample, defines a subset of the data used in the all
randomized subjects analysis and is characterized by the following
criteria:
(i) The completion of a certain prespecified minimal exposure to
the treatment regimen;
(ii) The availability of measurements of the primary
variable(s);
(iii) The absence of any major protocol violations, including
the violation of entry criteria where the nature of and reasons for
these protocol violations should be defined and documented before
breaking the blind.
This set may maximize the opportunity for a new treatment to
show additional efficacy in the analysis, and most closely reflects
the scientific model underlying the protocol. However, it may or may
not be conservative, depending on the study, and may be subject to
bias (possibly severe) because the subjects adhering most diligently
to the study protocol may not be representative of the entire study
population.
5.2.3 Roles of the All Randomized Subjects Analysis and the Per
Protocol Analysis
In general, it is advantageous to demonstrate a lack of
sensitivity of the principal trial results to alternative choices of
the set of subjects analyzed. In confirmatory trials, it is usually
appropriate to plan to conduct both all randomized subjects and per
protocol analyses, so that any differences between them can be the
subject of explicit discussion and interpretation. In some cases, it
may be desirable to plan further exploration of the sensitivity of
conclusions to the choice of the set of subjects analyzed. When the
all randomized subjects and the per protocol analyses come to
essentially the same conclusions, confidence in the study results is
increased, bearing in mind, however, that the need to exclude a
substantial proportion of subjects from the per protocol analysis
throws some doubt on the overall validity of the study.
All randomized subjects and per protocol analyses play different
roles in superiority trials (which seek to show the investigational
product to be superior) and in equivalence or noninferiority trials
(which seek to show the investigational product to be comparable,
see section 3.3.2). In superiority studies, the all randomized
subjects analysis usually tends to avoid the optimistic estimate of
efficacy which may result from a per protocol analysis, since the
noncompliers included in an all randomized subjects analysis will
generally diminish the overall treatment effect. However, in an
equivalence or noninferiority trial, the all randomized subjects
analysis is no longer conservative and its role should be considered
very carefully.
5.3 Missing Values and Outliers
Missing values represent a potential source of bias in a
clinical trial. Hence, every effort should be undertaken to fulfill
all the requirements of the protocol concerning the collection and
management of data. However, in reality there will almost always be
some missing data. A study may be regarded as valid, nonetheless,
provided the methods of dealing with missing values are sensible,
particularly if those methods are predefined in the analysis plan of
the protocol. Predefinition of methods may be facilitated by
updating this aspect of the analysis plan during the blind review.
Unfortunately, no universally applicable methods of handling missing
values can be recommended. An investigation should be made
concerning the sensitivity of the results of analysis to the method
of handling missing values, especially if the number of missing
values is substantial.
A similar approach should be adopted to exploring the influence
of outliers, the statistical definition of which is, to some extent,
arbitrary. Clear identification of a particular value as an outlier
is most convincing when justified medically as well as
statistically, and the medical context will then often define the
appropriate action. Any outlier procedure set out in the protocol
should not favor any treatment group a priori. Once again, this
aspect of the analysis plan can be usefully updated during blind
review. If no procedure for dealing with outliers was foreseen in
the study protocol, one analysis with the actual values and at least
one other analysis eliminating or reducing the outlier effect should
be performed and differences between their results discussed.
5.4 Data Transformation/Modification
The decision to transform key variables prior to analysis is
best made during the design of the trial on the basis of similar
data from earlier clinical trials. Transformations (e.g., square
root, logarithm) should be specified in the protocol and a rationale
provided, especially for the primary variable(s). The general
principles guiding the use of transformations to ensure that the
assumptions underlying the statistical methods are met are to be
found in standard texts; conventions for particular variables have
been developed in a number of specific clinical areas. The decision
on whether and how to transform a variable should be influenced by
the preference for a scale that facilitates clinical interpretation.
Similar considerations apply to other data modifications
sometimes used to create a variable for analysis, such as the use of
change from baseline, percentage change from baseline, the ``area
under the curve'' of repeated measures, or the ratio of two
different variables. Subsequent clinical interpretation should be
carefully considered, and the modification should be justified in
the protocol. Closely related points are made in section 2.2.2.
5.5 Estimation, Confidence Intervals, and Hypothesis Testing
The statistical section of the protocol should specify the
hypotheses that are to be tested and/or the treatment effects that
are to be estimated to satisfy the objectives of the trial. The
statistical methods to be used to accomplish these tasks should be
described for the primary (and preferably the secondary) variables,
and the underlying statistical model should be made clear. Estimates
of treatment effects should be accompanied by confidence intervals,
whenever possible, and the way in which these will be calculated
should be identified. The plan should also describe any intentions
to use baseline data to improve precision and to adjust estimates
for potential baseline differences, for example, by means of
analysis of covariance. The reporting of precise p-values (e.g.,
``P=0.034'') should be envisaged in the plan, rather than exclusive
reference to critical values (e.g., ``P<0.05''). it="" is="" important="" to="" clarify="" whether="" one-="" or="" two-sided="" tests="" of="" statistical="" significance="" will="" be="" used="" and,="" in="" particular,="" to="" justify="" prospectively="" the="" use="" of="" one-sided="" tests.="" if="" formal="" hypothesis="" tests="" are="" not="" considered="" appropriate,="" then="" the="" alternative="" process="" for="" arriving="" at="" statistical="" conclusions="" should="" be="" given.="" the="" particular="" statistical="" model="" chosen="" should="" reflect="" the="" current="" state="" of="" medical="" and="" statistical="" knowledge="" about="" the="" variables="" to="" be="" analyzed.="" all="" effects="" to="" be="" fitted="" in="" the="" analysis="" (for="" example,="" in="" analysis="" of="" variance="" models)="" should="" be="" fully="" specified="" and="" the="" manner,="" if="" any,="" in="" which="" this="" set="" of="" effects="" might="" be="" modified="" in="" response="" to="" preliminary="" results="" should="" be="" explained.="" the="" same="" considerations="" apply="" to="" the="" set="" of="" covariates="" fitted="" in="" an="" analysis="" of="" covariance.="" (see="" also="" section="" 5.7.).="" in="" the="" choice="" of="" statistical="" methods,="" due="" attention="" should="" be="" paid="" to="" the="" statistical="" distribution="" of="" both="" primary="" and="" secondary="" variables.="" when="" making="" this="" choice,="" it="" is="" important="" to="" bear="" in="" mind="" the="" need="" to="" provide="" statistical="" estimates="" of="" the="" size="" of="" treatment="" effects="" together="" with="" confidence="" intervals="" (in="" addition="" to="" significance="" tests),="" as="" this="" may="" influence="" the="" choice="" when="" there="" is="" any="" doubt="" about="" the="" appropriateness="" of="" the="" method.="" the="" primary="" analysis="" of="" the="" primary="" variable="" should="" be="" clearly="" distinguished="" from="" supporting="" analyses="" of="" the="" primary="" or="" secondary="" variables.="" within="" the="" statistical="" section="" of="" the="" protocol="" there="" should="" also="" be="" an="" outline="" of="" the="" way="" in="" which="" data="" other="" than="" the="" primary="" and="" secondary="" variables="" will="" be="" summarized="" and="" reported.="" this="" should="" include="" a="" reference="" to="" any="" approaches="" adopted="" for="" the="" purpose="" of="" achieving="" consistency="" of="" analysis="" across="" a="" range="" of="" studies,="" for="" example,="" for="" safety="" data.="" [[page="" 25723]]="" 5.6="" adjustment="" of="" type="" i="" error="" and="" confidence="" levels="" when="" multiplicity="" is="" present,="" the="" usual="" frequentist="" approach="" to="" the="" analysis="" of="" clinical="" trial="" data="" may="" necessitate="" an="" adjustment="" to="" the="" type="" i="" error.="" multiplicity="" may="" arise,="" for="" example,="" from="" multiple="" primary="" variables="" (see="" section="" 2.2.2),="" multiple="" comparisons="" of="" treatments,="" repeated="" evaluation="" over="" time,="" and/or="" interim="" analyses="" (see="" section="" 4.6).="" methods="" to="" avoid="" or="" reduce="" multiplicity="" are="" sometimes="" preferable="" when="" available,="" such="" as="" the="" identification="" of="" the="" key="" primary="" variable="" (multiple="" variables),="" the="" choice="" of="" a="" critical="" treatment="" contrast="" (multiple="" comparisons),="" the="" use="" of="" a="" summary="" measure="" such="" as="" ``area="" under="" the="" curve''="" (repeated="" measures).="" in="" confirmatory="" analyses,="" any="" aspects="" of="" multiplicity="" that="" remain="" after="" steps="" of="" this="" kind="" have="" been="" taken="" should="" be="" identified="" in="" the="" protocol;="" adjustment="" should="" always="" be="" considered="" and="" the="" details="" of="" any="" adjustment="" procedure="" or="" an="" explanation="" of="" why="" adjustment="" is="" not="" thought="" to="" be="" necessary="" should="" be="" set="" out="" in="" the="" analysis="" plan.="" 5.7="" subgroups,="" interactions,="" and="" covariates="" the="" primary="" variable(s)="" is="" often="" systematically="" related="" to="" other="" influences="" apart="" from="" treatment.="" for="" example,="" there="" may="" be="" relationships="" to="" covariates="" such="" as="" age="" and="" sex,="" or="" there="" may="" be="" differences="" between="" specific="" subgroups="" of="" subjects,="" such="" as="" those="" treated="" at="" the="" different="" centers="" of="" a="" multicenter="" trial.="" in="" some="" instances,="" an="" adjustment="" for="" the="" influence="" of="" covariates="" or="" for="" subgroup="" effects="" is="" an="" integral="" part="" of="" the="" analysis="" plan="" and="" hence="" should="" be="" set="" out="" in="" the="" protocol.="" prestudy="" deliberations="" should="" identify="" those="" covariates="" and="" factors="" expected="" to="" have="" an="" important="" influence="" on="" the="" primary="" variable(s),="" and="" should="" consider="" how="" to="" account="" for="" these="" in="" the="" analysis="" to="" improve="" precision="" and="" to="" compensate="" for="" any="" lack="" of="" balance="" between="" treatment="" groups.="" when="" the="" potential="" value="" of="" an="" adjustment="" is="" in="" doubt,="" it="" is="" often="" advisable="" to="" nominate="" the="" unadjusted="" analysis="" as="" the="" one="" for="" primary="" attention,="" the="" adjusted="" analysis="" being="" supportive.="" special="" attention="" should="" be="" paid="" to="" center="" effects="" and="" to="" the="" role="" of="" baseline="" measurements="" of="" the="" primary="" variable.="" it="" is="" not="" advisable="" to="" adjust="" the="" main="" analyses="" for="" covariates="" measured="" after="" randomization="" because="" they="" may="" be="" affected="" by="" the="" treatments.="" the="" treatment="" effect="" itself="" may="" also="" vary="" with="" subgroup="" or="" covariate--for="" example,="" the="" effect="" may="" decrease="" with="" age="" or="" may="" be="" larger="" in="" a="" particular="" diagnostic="" category="" of="" subjects.="" in="" some="" cases="" such="" interactions="" are="" anticipated,="" hence="" a="" subgroup="" analysis="" or="" a="" statistical="" model="" including="" interactions="" is="" part="" of="" the="" confirmatory="" analysis="" plan.="" in="" most="" cases,="" however,="" subgroup="" or="" interaction="" analyses="" are="" exploratory="" and="" should="" be="" clearly="" identified="" as="" such;="" they="" should="" explore="" the="" uniformity="" of="" any="" treatment="" effects="" found="" overall.="" in="" general,="" such="" analyses="" should="" proceed="" first="" through="" the="" addition="" of="" interaction="" terms="" to="" the="" statistical="" model="" in="" question,="" complemented="" by="" additional="" exploratory="" analysis="" within="" relevant="" subgroups="" of="" subjects,="" or="" within="" strata="" defined="" by="" the="" covariates.="" when="" exploratory,="" these="" analyses="" should="" be="" interpreted="" cautiously;="" any="" conclusion="" of="" treatment="" efficacy="" (or="" lack="" thereof)="" or="" safety="" based="" solely="" on="" exploratory="" subgroup="" analyses="" are="" unlikely="" to="" be="" accepted.="" 5.8="" integrity="" of="" data="" and="" computer="" software="" the="" credibility="" of="" the="" numerical="" results="" of="" the="" analysis="" depends="" on="" the="" quality="" and="" validity="" of="" the="" methods="" and="" software="" used="" both="" for="" data="" management="" (data="" entry,="" storage,="" verification,="" correction,="" and="" retrieval)="" and="" for="" processing="" the="" data="" statistically.="" data="" management="" activities="" should="" therefore="" be="" based="" on="" thorough="" and="" effective="" sop's.="" the="" computer="" software="" used="" for="" data="" management="" and="" statistical="" analysis="" should="" be="" reliable,="" and="" documentation="" of="" appropriate="" software="" testing="" procedures="" should="" be="" available.="" vi.="" evaluation="" of="" safety="" and="" tolerability="" 6.1="" scope="" of="" evaluation="" in="" all="" clinical="" trials,="" evaluation="" of="" safety="" and="" tolerability="" constitutes="" an="" important="" element.="" in="" early="" phases,="" this="" evaluation="" is="" mostly="" of="" an="" exploratory="" nature="" and="" is="" only="" sensitive="" to="" frank="" expressions="" of="" toxicity,="" whereas="" in="" later="" phases,="" the="" establishment="" of="" the="" safety="" and="" tolerability="" profile="" of="" a="" drug="" can="" be="" characterized="" more="" fully="" in="" larger="" samples="" of="" subjects.="" later="" phase="" controlled="" trials="" represent="" an="" important="" means="" of="" exploring,="" in="" an="" unbiased="" manner,="" any="" new="" potential="" adverse="" effects,="" even="" if="" such="" trials="" generally="" lack="" power="" in="" this="" respect.="" certain="" studies="" may="" be="" designed="" with="" the="" purpose="" of="" making="" specific="" claims="" about="" superiority="" or="" equivalence="" with="" regard="" to="" safety="" and="" tolerability="" compared="" to="" another="" drug="" or="" to="" another="" dose="" of="" the="" investigational="" drug.="" such="" specific="" claims="" should="" be="" supported="" by="" relevant="" evidence="" from="" confirmatory="" studies,="" similar="" to="" that="" necessary="" for="" corresponding="" efficacy="" claims.="" 6.2="" choice="" of="" variables="" and="" data="" collection="" in="" any="" clinical="" trial,="" the="" methods="" and="" measurements="" chosen="" to="" evaluate="" the="" safety="" and="" tolerability="" of="" a="" drug="" will="" depend="" on="" a="" number="" of="" factors,="" including="" knowledge="" of="" the="" adverse="" effects="" of="" closely="" related="" drugs,="" information="" from="" nonclinical="" and="" earlier="" clinical="" studies,="" and="" possible="" consequences="" of="" the="" pharmacodynamic/="" pharmacokinetic="" properties="" of="" the="" particular="" drug,="" the="" mode="" of="" administration,="" the="" type="" of="" subjects="" to="" be="" studied,="" and="" the="" duration="" of="" the="" study.="" laboratory="" tests="" concerning="" clinical="" chemistry="" and="" hematology,="" vital="" signs,="" and="" clinical="" adverse="" events="" (diseases,="" signs,="" and="" symptoms)="" usually="" form="" the="" main="" body="" of="" the="" safety="" and="" tolerability="" data.="" the="" occurrence="" of="" serious="" adverse="" events="" and="" treatment="" discontinuations="" due="" to="" adverse="" events="" are="" particularly="" important="" to="" register="" (see="" ich="" e2a="" and="" ich="" e3).="" furthermore,="" it="" is="" recommended="" that="" a="" consistent="" methodology="" be="" used="" for="" the="" data="" collection="" and="" evaluation="" throughout="" a="" clinical="" trial="" program="" to="" facilitate="" the="" combining="" of="" data="" from="" different="" trials.="" the="" use="" of="" a="" common="" adverse="" event="" dictionary="" is="" particularly="" important.="" this="" dictionary="" has="" a="" structure="" that="" makes="" it="" possible="" to="" summarize="" the="" adverse="" event="" data="" on="" three="" different="" levels:="" system-="" organ="" class,="" preferred="" term,="" or="" included="" term.="" the="" preferred="" term="" is="" the="" level="" on="" which="" adverse="" events="" usually="" are="" summarized,="" and="" preferred="" terms="" belonging="" to="" the="" same="" system-organ="" class="" could="" then="" be="" brought="" together="" in="" the="" descriptive="" presentation="" of="" data="" (see="" ich="" e2b).="" 6.3="" set="" of="" subjects="" to="" be="" evaluated="" and="" presentation="" of="" data="" for="" the="" overall="" safety="" and="" tolerability="" assessment,="" the="" set="" of="" subjects="" to="" be="" summarized="" is="" usually="" defined="" as="" those="" subjects="" who="" received="" at="" least="" one="" dose="" of="" the="" investigational="" drug.="" safety="" and="" tolerability="" variables="" should="" be="" collected="" as="" comprehensively="" as="" possible="" from="" these="" subjects,="" including="" type="" of="" adverse="" event,="" severity,="" onset,="" and="" duration="" (see="" ich="" e2b).="" additional="" safety="" and="" tolerability="" evaluations="" may="" be="" needed="" in="" specific="" subpopulations,="" such="" as="" females,="" the="" elderly="" (see="" ich="" e7),="" the="" severely="" ill,="" or="" those="" who="" have="" a="" common="" concomitant="" treatment.="" these="" evaluations="" may="" need="" to="" address="" more="" specific="" issues="" (see="" ich="" e3).="" all="" safety="" and="" tolerability="" variables="" need="" attention="" during="" evaluation,="" and="" the="" broad="" approach="" should="" be="" indicated="" in="" the="" protocol.="" all="" adverse="" events="" should="" be="" reported,="" whether="" or="" not="" they="" are="" considered="" to="" be="" related="" to="" treatment.="" all="" available="" data="" in="" the="" study="" population="" should="" be="" accounted="" for="" in="" the="" evaluation.="" definitions="" of="" measurement="" units="" and="" reference="" ranges="" of="" laboratory="" variables="" should="" be="" made="" with="" care;="" if="" different="" units="" or="" different="" reference="" ranges="" appear="" in="" the="" same="" trial="" (e.g.,="" if="" more="" than="" one="" laboratory="" is="" involved),="" then="" measurements="" should="" be="" appropriately="" standardized="" to="" allow="" a="" unified="" evaluation.="" use="" of="" a="" toxicity="" grading="" scale="" should="" be="" prespecified="" and="" justified.="" the="" incidence="" of="" a="" certain="" adverse="" event="" is="" usually="" expressed="" in="" the="" form="" of="" a="" proportion="" relating="" number="" of="" subjects="" experiencing="" events="" to="" number="" of="" subjects="" at="" risk.="" however,="" it="" is="" not="" always="" self-evident="" how="" to="" assess="" incidence.="" for="" example,="" depending="" on="" the="" situation,="" the="" number="" of="" exposed="" subjects="" or="" the="" extent="" of="" exposure="" (in="" person-years)="" could="" be="" considered="" for="" the="" denominator.="" whether="" the="" purpose="" of="" the="" calculation="" is="" to="" estimate="" a="" risk="" or="" to="" make="" a="" comparison="" between="" treatment="" groups,="" it="" is="" important="" that="" the="" definition="" is="" given="" in="" the="" protocol.="" this="" is="" especially="" important="" if="" long-term="" treatment="" is="" planned="" and="" a="" substantial="" proportion="" of="" treatment="" withdrawals="" or="" deaths="" are="" expected.="" for="" such="" situations,="" survival="" analysis="" methods="" should="" be="" considered="" and="" cumulative="" adverse="" event="" rates="" calculated="" in="" order="" to="" avoid="" the="" risk="" of="" underestimation.="" methods="" to="" account="" for="" situations="" where="" there="" is="" a="" substantial="" background="" noise="" of="" signs="" and="" symptoms="" (e.g.,="" in="" psychiatric="" trials)="" should="" be="" considered="" in="" the="" estimation="" of="" risk="" for="" different="" adverse="" events.="" one="" such="" method="" is="" to="" make="" use="" of="" the="" ``treatment="" emergent''="" concept="" in="" which="" adverse="" events="" are="" recorded="" only="" if="" they="" emerge="" or="" worsen="" relative="" to="" pretreatment="" baseline.="" other="" methods="" to="" reduce="" the="" background="" noise="" may="" also="" be="" appropriate,="" such="" as="" ignoring="" adverse="" events="" of="" mild="" severity="" or="" requiring="" that="" an="" event="" should="" have="" been="" [[page="" 25724]]="" observed="" at="" repeated="" visits="" to="" qualify="" for="" inclusion="" in="" the="" numerator.="" such="" methods="" should="" be="" explained="" and="" justified="" in="" the="" protocol.="" 6.4="" statistical="" evaluation="" the="" investigation="" of="" safety="" and="" tolerability="" is="" a="" multidimensional="" problem.="" although="" some="" specific="" adverse="" effects="" can="" usually="" be="" anticipated="" and="" specifically="" monitored="" for="" any="" drug,="" the="" range="" of="" possible="" adverse="" effects="" is="" very="" large,="" and="" new="" and="" unforeseeable="" effects="" are="" always="" possible.="" further,="" an="" adverse="" event="" experienced="" after="" a="" protocol="" violation,="" such="" as="" use="" of="" an="" excluded="" medication,="" may="" introduce="" a="" bias.="" this="" background="" underlies="" the="" statistical="" difficulties="" associated="" with="" the="" analytical="" evaluation="" of="" safety="" and="" tolerability="" of="" drugs,="" and="" means="" that="" confirmatory="" information="" from="" phase="" iii="" clinical="" trials="" is="" the="" exception="" rather="" than="" the="" rule.="" in="" most="" trials,="" the="" safety="" and="" tolerability="" implications="" are="" best="" addressed="" by="" applying="" descriptive="" statistical="" methods="" to="" the="" data,="" supplemented="" by="" calculation="" of="" confidence="" intervals="" wherever="" this="" aids="" interpretation.="" it="" is="" also="" valuable="" to="" make="" use="" of="" graphical="" presentations="" in="" which="" patterns="" of="" adverse="" events="" are="" displayed="" both="" within="" treatment="" groups="" and="" within="" subjects.="" the="" calculation="" of="" p-values="" is="" sometimes="" useful,="" either="" as="" an="" aid="" to="" evaluating="" a="" specific="" difference="" of="" interest="" or="" as="" a="" ``flagging''="" device="" applied="" to="" a="" large="" number="" of="" safety="" and="" tolerability="" variables="" to="" highlight="" differences="" worthy="" of="" further="" attention.="" this="" is="" particularly="" useful="" for="" laboratory="" data,="" which="" otherwise="" can="" be="" difficult="" to="" summarize="" appropriately.="" it="" is="" recommended="" that="" laboratory="" data="" be="" subjected="" to="" both="" a="" quantitative="" analysis,="" e.g.,="" evaluation="" of="" treatment="" means,="" and="" a="" qualitative="" analysis,="" where="" counting="" of="" numbers="" above="" or="" below="" certain="" thresholds="" are="" calculated.="" if="" hypothesis="" tests="" are="" used,="" statistical="" adjustments="" for="" multiplicity="" to="" quantitate="" the="" type="" i="" error="" are="" appropriate,="" but="" the="" type="" ii="" error="" is="" usually="" of="" more="" concern.="" care="" should="" be="" taken="" when="" interpreting="" putative="" statistically="" significant="" findings="" when="" there="" is="" no="" multiplicity="" adjustment.="" in="" the="" majority="" of="" studies,="" investigators="" are="" seeking="" to="" establish="" that="" there="" are="" no="" clinically="" unacceptable="" differences="" in="" safety="" and="" tolerability="" compared="" with="" either="" a="" comparator="" drug="" or="" a="" placebo.="" as="" is="" the="" case="" for="" noninferiority="" or="" equivalence="" evaluation="" of="" efficacy,="" the="" use="" of="" confidence="" intervals="" is="" preferred="" to="" hypothesis="" testing="" in="" this="" situation.="" in="" this="" way,="" the="" considerable="" imprecision="" often="" arising="" from="" low="" frequencies="" of="" occurrence="" is="" clearly="" demonstrated.="" 6.5="" single="" study="" versus="" integrated="" summary="" the="" safety="" and="" tolerability="" properties="" of="" a="" drug="" are="" commonly="" summarized="" across="" studies="" continuously="" during="" an="" investigational="" product's="" development="" and,="" in="" particular,="" for="" the="" submission="" of="" a="" marketing="" application.="" the="" usefulness="" of="" this="" summary,="" however,="" is="" dependent="" on="" adequate="" and="" well-controlled="" individual="" studies="" with="" high="" data="" quality.="" the="" overall="" usefulness="" of="" a="" drug="" is="" always="" a="" question="" of="" balance="" between="" risk="" and="" benefit;="" in="" a="" single="" trial,="" such="" a="" perspective="" could="" also="" be="" considered="" even="" if="" the="" assessment="" of="" risk/benefit="" usually="" is="" performed="" in="" the="" summary="" of="" the="" entire="" clinical="" trial="" program.="" (see="" section="" 7.1.2.)="" for="" more="" details="" of="" safety="" and="" tolerability="" reports,="" see="" section="" 12="" of="" the="" ich="" guideline="" e3="" on="" ``clinical="" study="" reports:="" structure="" and="" content.''="" vii.="" reporting="" 7.1="" evaluation="" and="" reporting="" as="" stated="" in="" the="" introduction,="" the="" structure="" and="" content="" of="" clinical="" reports="" is="" the="" subject="" of="" ich="" guideline="" e3.="" that="" ich="" guideline="" fully="" covers="" the="" reporting="" of="" statistical="" work,="" appropriately="" integrated="" with="" clinical="" and="" other="" material.="" the="" current="" section="" is="" therefore="" relatively="" brief.="" during="" the="" planning="" phase="" of="" a="" trial,="" the="" principal="" features="" of="" the="" analysis="" should="" have="" been="" specified="" in="" the="" protocol="" as="" described="" in="" section="" 5.="" when="" the="" conduct="" of="" the="" trial="" is="" over="" and="" the="" data="" are="" assembled="" and="" available="" for="" preliminary="" inspection,="" it="" is="" valuable="" to="" carry="" out="" the="" blind="" review="" of="" the="" planned="" analysis="" also="" described="" in="" section="" 5.="" this="" preanalysis="" review,="" blinded="" to="" treatment,="" should:="" (1)="" cover="" decisions="" concerning="" the="" exclusion="" of="" subjects="" or="" data="" from="" the="" analysis="" sets;="" (2)="" check="" possible="" transformations="" and="" define="" outliers;="" (3)="" add="" to="" the="" model="" important="" covariates="" identified="" in="" other="" recent="" research;="" (4)="" reconsider="" the="" use="" of="" parametric="" or="" nonparametric="" methods.="" decisions="" made="" at="" this="" time="" should="" be="" described="" in="" the="" report="" and="" should="" be="" distinguished="" from="" those="" made="" after="" the="" statistician="" has="" had="" access="" to="" the="" treatment="" codes,="" as="" blind="" decisions="" will="" generally="" introduce="" less="" potential="" for="" bias.="" many="" of="" the="" more="" detailed="" aspects="" of="" presentation="" and="" tabulation="" should="" be="" finalized="" at="" or="" about="" the="" time="" of="" the="" blind="" review="" so="" that,="" by="" the="" time="" of="" the="" actual="" analysis,="" full="" plans="" exist="" for="" all="" its="" aspects="" including="" subject="" selection,="" data="" selection="" and="" modification,="" data="" summary="" and="" tabulation,="" estimation="" and="" hypothesis="" testing.="" once="" data="" validation="" is="" complete,="" the="" analysis="" should="" proceed="" according="" to="" the="" predefined="" plans;="" the="" more="" these="" plans="" are="" adhered="" to,="" the="" greater="" the="" credibility="" of="" the="" results.="" particular="" attention="" should="" be="" paid="" to="" any="" differences="" between="" the="" planned="" analysis="" and="" the="" actual="" analysis="" as="" described="" in="" the="" protocol,="" the="" protocol="" amendments,="" or="" the="" updated="" statistical="" analysis="" plan="" based="" on="" a="" blind="" review="" of="" data.="" a="" careful="" explanation="" should="" be="" provided="" for="" deviations="" from="" the="" planned="" analysis.="" all="" subjects="" who="" entered="" the="" trial="" should="" be="" accounted="" for="" in="" the="" report,="" whether="" or="" not="" they="" are="" included="" in="" the="" analysis.="" all="" reasons="" for="" exclusion="" from="" analysis="" should="" be="" documented;="" for="" any="" subject="" included="" in="" the="" set="" of="" all="" randomized="" subjects="" but="" not="" in="" the="" per-protocol="" set,="" the="" reasons="" for="" exclusion="" from="" the="" latter="" should="" also="" be="" documented.="" similarly,="" for="" all="" subjects="" included="" in="" an="" analysis="" set,="" the="" measurements="" of="" all="" important="" variables="" should="" be="" accounted="" for="" at="" all="" relevant="" time-points.="" the="" effect="" of="" all="" losses="" of="" subjects="" or="" data,="" withdrawals="" from="" treatment,="" and="" major="" protocol="" violations="" on="" the="" main="" analyses="" of="" the="" primary="" variable(s)="" should="" be="" considered="" carefully.="" subjects="" lost="" to="" followup,="" withdrawn="" from="" treatment,="" or="" with="" a="" severe="" protocol="" violation="" should="" be="" identified;="" a="" descriptive="" analysis="" of="" the="" subjects="" should="" be="" provided,="" including="" the="" reasons="" for="" their="" loss="" and="" the="" relationship="" of="" the="" loss="" to="" treatment="" and="" outcome.="" descriptive="" statistics="" form="" an="" indispensable="" part="" of="" reports.="" suitable="" tables="" and/or="" graphical="" presentations="" should="" illustrate="" clearly="" the="" important="" features="" of="" the="" primary="" and="" secondary="" variables="" and="" of="" key="" prognostic="" and="" demographic="" variables.="" the="" results="" of="" the="" main="" analyses="" relating="" to="" the="" objectives="" of="" the="" trial="" should="" be="" the="" subject="" of="" particularly="" careful="" descriptive="" presentation.="" although="" the="" primary="" goal="" of="" the="" analysis="" of="" a="" clinical="" trial="" should="" be="" to="" answer="" the="" questions="" posed="" by="" its="" main="" objectives,="" new="" questions="" based="" on="" the="" observed="" data="" may="" well="" emerge="" during="" the="" unblinded="" analysis.="" additional="" and="" perhaps="" complex="" statistical="" analysis="" may="" be="" the="" consequence.="" this="" additional="" work="" should="" be="" strictly="" distinguished="" in="" the="" report="" from="" work="" that="" was="" planned="" in="" the="" protocol.="" the="" play="" of="" chance="" may="" lead="" to="" unforeseen="" imbalances="" between="" the="" treatment="" groups="" in="" terms="" of="" baseline="" measurements="" not="" predefined="" as="" covariates="" in="" the="" analysis="" plan="" but="" having="" some="" prognostic="" importance="" nevertheless.="" this="" is="" best="" dealt="" with="" by="" showing="" that="" a="" subsidiary="" analysis="" that="" accounts="" for="" these="" imbalances="" reaches="" essentially="" the="" same="" conclusions="" as="" the="" planned="" analysis.="" if="" this="" is="" not="" the="" case,="" the="" effect="" of="" the="" imbalances="" on="" the="" conclusions="" should="" be="" discussed.="" in="" general,="" sparing="" use="" should="" be="" made="" of="" unplanned="" subsidiary="" analyses.="" subsidiary="" analyses="" are="" often="" carried="" out="" when="" it="" is="" thought="" that="" the="" treatment="" effect="" may="" vary="" according="" to="" some="" other="" factor="" or="" factors.="" an="" attempt="" may="" then="" be="" made="" to="" identify="" subgroups="" of="" subjects="" for="" whom="" the="" effect="" is="" particularly="" beneficial.="" the="" potential="" dangers="" of="" over-interpretation="" of="" unplanned="" subgroup="" analyses="" are="" well="" known="" (see="" also="" section="" 5.7)="" and="" should="" be="" carefully="" avoided.="" although="" similar="" problems="" of="" interpretation="" arise="" if="" a="" treatment="" appears="" to="" have="" no="" benefit,="" or="" an="" adverse="" effect,="" in="" a="" subgroup="" of="" subjects,="" such="" possibilities="" need="" to="" be="" properly="" assessed="" and="" should="" therefore="" be="" reported.="" finally,="" statistical="" judgement="" should="" be="" brought="" to="" bear="" on="" the="" analysis,="" interpretation,="" and="" presentation="" of="" the="" results="" of="" a="" clinical="" trial.="" to="" this="" end,="" the="" trial="" statistician="" should="" be="" a="" member="" of="" the="" team="" responsible="" for="" the="" study="" report="" and="" should="" approve="" the="" final="" report.="" 7.2="" summarizing="" the="" clinical="" database="" an="" overall="" summary="" and="" synthesis="" of="" the="" evidence="" on="" safety="" and="" efficacy="" from="" all="" the="" reported="" clinical="" trials="" is="" required="" for="" a="" marketing="" application.="" this="" may="" be="" accompanied,="" when="" appropriate,="" by="" a="" statistical="" combination="" of="" results.="" within="" the="" summary="" a="" number="" of="" areas="" of="" specific="" statistical="" interest="" arise:="" describing="" the="" demography="" and="" clinical="" features="" of="" the="" population="" treated="" during="" the="" course="" of="" the="" [[page="" 25725]]="" clinical="" trial="" program;="" addressing="" the="" key="" questions="" of="" efficacy="" by="" considering="" the="" results="" of="" the="" relevant="" (usually="" controlled)="" trials="" and="" highlighting="" the="" degree="" to="" which="" they="" reinforce="" or="" contradict="" each="" other;="" summarizing="" the="" safety="" information="" available="" from="" the="" combined="" database="" of="" all="" the="" studies="" whose="" results="" contribute="" to="" the="" marketing="" application="" and="" identifying="" potential="" safety="" issues.="" during="" the="" design="" of="" a="" clinical="" program,="" careful="" attention="" should="" be="" paid="" to="" the="" uniform="" definition="" and="" collection="" of="" measurements="" which="" will="" facilitate="" subsequent="" interpretation="" of="" the="" series="" of="" trials,="" particularly="" if="" they="" are="" likely="" to="" be="" combined="" across="" trials.="" a="" common="" dictionary="" for="" recording="" the="" details="" of="" medication,="" medical="" history,="" and="" adverse="" events="" should="" be="" selected="" and="" used.="" a="" common="" definition="" of="" the="" primary="" and="" secondary="" variables="" is="" nearly="" alway="" aworthwhile="" and="" is="" essential="" for="" meta-analysis.="" the="" manner="" of="" measuring="" key="" efficacy="" variables,="" the="" timing="" of="" assessments="" relative="" to="" randomization/entry,="" the="" handling="" of="" protocol="" violators="" and="" deviators,="" and="" perhaps="" the="" definition="" of="" prognostic="" factors,="" should="" all="" be="" kept="" compatible="" unless="" there="" are="" valid="" reasons="" not="" to="" do="" so.="" any="" statistical="" procedures="" used="" to="" combine="" data="" across="" trials="" should="" be="" described="" in="" detail.="" attention="" should="" be="" paid="" to="" the="" possibility="" of="" bias="" associated="" with="" the="" selection="" of="" trials,="" to="" the="" homogeneity="" of="" their="" results,="" and="" to="" the="" proper="" modeling="" of="" the="" various="" sources="" of="" variation.="" the="" sensitivity="" of="" conclusions="" to="" the="" assumptions="" and="" selections="" made="" should="" be="" explored.="" 7.2.1="" efficacy="" data="" individual="" clinical="" trials="" should="" always="" be="" large="" enough="" to="" satisfy="" their="" objectives.="" additional="" valuable="" information="" may="" also="" be="" gained="" by="" summarizing="" a="" series="" of="" clinical="" trials="" that="" address="" essentially="" identical="" key="" efficacy="" questions.="" the="" main="" results="" of="" such="" a="" set="" of="" studies="" should="" be="" presented="" in="" an="" identical="" form="" to="" permit="" comparison,="" usually="" in="" tables="" or="" graphs="" that="" focus="" on="" estimates="" plus="" confidence="" limits.="" the="" use="" of="" meta-analytic="" techniques="" to="" combine="" these="" estimates="" is="" often="" a="" useful="" addition="" because="" it="" allows="" a="" more="" precise="" overall="" estimate="" of="" the="" size="" of="" the="" treatment="" effects="" to="" be="" generated="" and="" provides="" a="" complete="" and="" concise="" summary="" of="" the="" results="" of="" the="" trials.="" under="" exceptional="" circumstances,="" a="" meta-analytic="" approach="" may="" also="" be="" the="" most="" appropriate="" way,="" or="" the="" only="" way,="" of="" providing="" sufficient="" overall="" evidence="" of="" efficacy="" via="" an="" overall="" hypothesis="" test.="" 7.2.2="" safety="" data="" in="" summarizing="" safety="" data,="" it="" is="" important="" to="" examine="" the="" safety="" database="" thoroughly="" for="" any="" indications="" of="" potential="" toxicity="" and="" to="" follow="" up="" any="" indications="" by="" looking="" for="" an="" associated="" supportive="" pattern="" of="" observations.="" the="" combination="" of="" the="" safety="" data="" from="" all="" human="" exposure="" to="" the="" drug="" provides="" an="" important="" source="" of="" information="" because="" its="" larger="" sample="" size="" provides="" the="" best="" chance="" of="" detecting="" the="" rarer="" adverse="" events="" and,="" perhaps,="" of="" estimating="" their="" approximate="" incidence.="" however,="" incidence="" data="" from="" this="" database="" are="" difficult="" to="" evaluate="" without="" a="" natural="" comparator="" group,="" and="" data="" from="" comparative="" studies="" are="" especially="" valuable="" in="" overcoming="" this="" difficulty.="" the="" results="" from="" studies="" that="" use="" a="" common="" comparator="" (placebo="" or="" specific="" active="" comparator)="" should="" be="" combined="" and="" presented="" separately="" for="" each="" comparator="" providing="" sufficient="" data.="" all="" indications="" of="" potential="" toxicity="" arising="" from="" exploration="" of="" the="" data="" should="" be="" reported.="" the="" evaluation="" of="" the="" reality="" of="" these="" potential="" adverse="" effects="" should="" take="" into="" account="" the="" issue="" of="" multiplicity="" arising="" from="" the="" numerous="" comparisons="" made.="" the="" evaluation="" should="" also="" make="" appropriate="" use="" of="" survival="" analysis="" methods="" to="" exploit="" the="" potential="" relationship="" of="" the="" incidence="" of="" adverse="" events="" to="" duration="" of="" exposure="" and/or="" followup.="" the="" risks="" associated="" with="" identified="" adverse="" effects="" should="" be="" appropriately="" quantified="" to="" allow="" a="" proper="" assessment="" of="" the="" risk/benefit="" relationship.="" annex="" 1="" glossary="" all="" randomized="" subjects--the="" analysis="" set="" that="" includes="" all="" subjects="" who="" were="" randomized="" to="" treatment,="" with="" these="" subjects="" assigned="" to="" the="" treatment="" group="" to="" which="" they="" were="" randomized.="" practical="" considerations,="" such="" as="" missing="" data,="" may="" lead="" to="" some="" subjects="" in="" this="" set="" not="" being="" included="" in="" the="" corresponding="" analysis.="" analysis="" plan--the="" strategy="" for="" analysis="" predefined="" in="" the="" statistical="" section="" of="" the="" protocol="" and/or="" protocol="" amendments.="" the="" plan="" may="" be="" elaborated="" in="" a="" separate="" document="" (internal="" to="" the="" sponsor)="" to="" cover="" technical="" details="" and="" procedures="" for="" implementing="" the="" statistical="" analyses.="" the="" plan="" should="" be="" reviewed="" and="" possibly="" updated="" as="" a="" result="" of="" the="" blind="" review="" of="" the="" data.="" bayesian="" approaches--approaches="" to="" data="" analysis="" that="" provide="" a="" posterior="" probability="" distribution="" for="" some="" parameter="" (e.g.,="" treatment="" effect),="" derived="" from="" the="" observed="" data="" and="" a="" prior="" probability="" distribution="" for="" the="" parameter.="" the="" posterior="" distribution="" is="" then="" used="" as="" the="" basis="" for="" statistical="" inference.="" bias="" (statistical="" and="" operational)--the="" systematic="" tendency="" of="" any="" factors="" associated="" with="" the="" design,="" conduct,="" analysis,="" and="" evaluation="" of="" the="" results="" of="" a="" clinical="" trial="" to="" make="" the="" estimate="" of="" a="" treatment="" effect="" deviate="" from="" its="" true="" value.="" bias="" introduced="" through="" deviations="" in="" conduct="" is="" referred="" to="" as="" ``operational''="" bias.="" the="" other="" sources="" of="" bias="" listed="" above="" are="" referred="" to="" as="" ``statistical.''="" blind="" review--the="" checking="" and="" assessment="" of="" data="" during="" the="" course="" of="" the="" study,="" but="" before="" the="" breaking="" of="" the="" blind,="" for="" the="" purpose="" of="" finalizing="" the="" analysis="" plan.="" content="" validity--the="" extent="" to="" which="" a="" variable="" (e.g.,="" a="" rating="" scale)="" measures="" what="" it="" is="" supposed="" to="" measure.="" double="" dummy--a="" technique="" for="" retaining="" the="" blind="" when="" administering="" supplies="" in="" a="" clinical="" trial,="" when="" the="" two="" treatments="" cannot="" be="" made="" identical.="" supplies="" are="" prepared="" for="" treatment="" a="" (active="" and="" indistinguishable="" placebo)="" and="" for="" treatment="" b="" (active="" and="" indistinguishable="" placebo).="" subjects="" then="" take="" two="" sets="" of="" treatment;="" either="" a="" (active)="" and="" b="" (placebo),="" or="" a="" (placebo)="" and="" b="" (active).="" dropout--a="" subject="" in="" a="" clinical="" trial="" who="" for="" any="" reason="" fails="" to="" continue="" in="" the="" trial="" until="" the="" last="" visit="" required="" of="" him/her="" by="" the="" study="" protocol.="" equivalence="" trial--a="" trial="" with="" the="" primary="" objective="" of="" showing="" that="" the="" response="" to="" two="" or="" more="" treatments="" differs="" by="" an="" amount="" which="" is="" clinically="" unimportant.="" this="" is="" usually="" demonstrated="" by="" showing="" that="" the="" true="" treatment="" difference="" is="" likely="" to="" lie="" between="" a="" lower="" and="" an="" upper="" equivalence="" margin="" of="" clinically="" acceptable="" differences.="" frequentist="" methods--statistical="" methods,="" such="" as="" significance="" tests="" and="" confidence="" intervals,="" which="" can="" be="" interpreted="" in="" terms="" of="" the="" frequency="" of="" certain="" outcomes="" occurring="" in="" hypothetical="" repeated="" realizations="" of="" the="" same="" experimental="" situation.="" generalizability,="" generalization--the="" extent="" to="" which="" the="" findings="" of="" a="" clinical="" trial="" can="" be="" reliably="" extrapolated="" from="" the="" subjects="" who="" participated="" in="" the="" trial="" to="" a="" broader="" patient="" population.="" global="" assessment="" variable--a="" single="" variable,="" usually="" a="" scale="" of="" ordered="" categorical="" ratings,="" that="" integrates="" objective="" variables="" and="" the="" investigator's="" overall="" impression="" about="" the="" state="" or="" change="" in="" state="" of="" a="" subject.="" independent="" data="" monitoring="" committee="" (idmc)="" (data="" and="" safety="" monitoring="" board,="" monitoring="" committee,="" data="" monitoring="" committee)--="" an="" independent="" data="" monitoring="" committee="" that="" may="" be="" established="" by="" the="" sponsor="" to="" assess="" at="" intervals="" the="" progress="" of="" a="" clinical="" trial,="" the="" safety="" data,="" and="" the="" critical="" efficacy="" endpoints,="" and="" to="" recommend="" to="" the="" sponsor="" whether="" to="" continue,="" modify,="" or="" stop="" a="" trial.="" intention-to-treat="" principle--the="" principle="" that="" asserts="" that="" the="" effect="" of="" a="" treatment="" policy="" can="" be="" best="" assessed="" by="" evaluating="" on="" the="" basis="" of="" the="" intention="" to="" treat="" a="" subject="" (i.e.,="" the="" planned="" treatment="" regimen)="" rather="" than="" the="" actual="" treatment="" given.="" it="" has="" the="" consequence="" that="" subjects="" allocated="" to="" a="" treatment="" group="" should="" be="" followed="" up,="" assessed,="" and="" analyzed="" as="" members="" of="" that="" group="" irrespective="" of="" their="" compliance="" to="" the="" planned="" course="" of="" treatment.="" interaction="" (qualitative="" and="" quantitative)--the="" situation="" in="" which="" a="" treatment="" contrast="" (e.g.,="" difference="" between="" investigational="" product="" and="" control)="" is="" dependent="" on="" another="" factor="" (e.g.,="" center).="" a="" quantitative="" interaction="" refers="" to="" the="" case="" where="" the="" magnitude="" of="" the="" contrast="" differs="" at="" the="" different="" levels="" of="" the="" factor,="" whereas="" for="" a="" qualitative="" interaction="" the="" direction="" of="" the="" contrast="" differs="" for="" at="" least="" one="" level="" of="" the="" factor.="" inter-="" and="" intrarater="" reliability--the="" level="" of="" consistency="" of="" a="" rater="" (intra)="" or="" a="" group="" of="" raters="" (inter)="" in="" making="" an="" assessment="" of="" treatment="" outcome.="" interim="" analysis--any="" analysis="" intended="" to="" compare="" treatment="" arms="" with="" respect="" to="" efficacy="" or="" safety="" at="" any="" time="" prior="" to="" the="" formal="" completion="" of="" a="" trial.="" meta-analysis--the="" formal="" evaluation="" of="" the="" quantitative="" evidence="" from="" two="" or="" more="" trials="" bearing="" on="" the="" same="" question.="" this="" most="" commonly="" involves="" the="" statistical="" combination="" of="" summary="" statistics="" from="" the="" various="" trials,="" but="" the="" term="" is="" sometimes="" used="" to="" refer="" to="" the="" combination="" of="" the="" raw="" data.="" [[page="" 25726]]="" multicenter="" trial--a="" trial="" involving="" two="" or="" more="" study="" centers,="" a="" common="" study="" protocol,="" and="" a="" single="" analysis="" plan="" pooling="" the="" data="" across="" all="" centers.="" noninferiority="" trial--a="" trial="" with="" the="" primary="" objective="" of="" showing="" that="" the="" response="" to="" the="" investigational="" product="" is="" not="" clinically="" inferior="" to="" a="" comparative="" agent="" (active="" or="" placebo="" control).="" preferred="" and="" included="" terms--in="" a="" hierarchical="" medical="" dictionary,="" for="" example,="" who-art,="" the="" included="" term="" is="" the="" lowest="" level="" of="" dictionary="" term="" to="" which="" the="" investigator="" description="" is="" coded.="" the="" preferred="" term="" is="" the="" level="" of="" grouping="" of="" included="" terms="" typically="" used="" in="" reporting="" frequency="" of="" occurrence.="" for="" example,="" the="" investigator="" text="" ``pain="" in="" the="" left="" arm''="" might="" be="" coded="" to="" the="" included="" term="" ``joint="" pain,''="" which="" is="" reported="" at="" the="" preferred="" term="" level="" as="" ``arthralgia.''="" per="" protocol="" set="" (valid="" cases,="" efficacy="" sample,="" evaluable="" subjects="" sample)--the="" set="" of="" data="" generated="" by="" the="" subset="" of="" subjects="" who="" complied="" with="" the="" protocol="" sufficiently="" to="" ensure="" that="" these="" data="" would="" be="" likely="" to="" exhibit="" the="" effects="" of="" treatment="" according="" to="" the="" underlying="" scientific="" model.="" compliance="" covers="" such="" considerations="" as="" exposure="" to="" treatment,="" availability="" of="" measurements,="" and="" absence="" of="" major="" protocol="" violations.="" safety="" and="" tolerability--the="" safety="" of="" a="" medical="" product="" concerns="" the="" medical="" risk="" to="" the="" subject,="" usually="" assessed="" in="" a="" clinical="" trial="" by="" laboratory="" tests="" (including="" clinical="" chemistry="" and="" hematology),="" vital="" signs,="" clinical="" adverse="" events="" (diseases,="" signs="" and="" symptoms),="" and="" other="" special="" safety="" tests="" (e.g.,="" electrocardiograms,="" ophthalmology).="" the="" tolerability="" of="" the="" medical="" product="" represents="" the="" degree="" to="" which="" overt="" adverse="" effects="" can="" be="" tolerated="" by="" the="" subject.="" superiority="" trial--a="" trial="" with="" the="" primary="" objective="" of="" showing="" that="" the="" response="" to="" the="" investigational="" product="" is="" superior="" to="" a="" comparative="" agent="" (active="" or="" placebo="" control).="" surrogate="" variable--a="" variable="" that="" provides="" an="" indirect="" measurement="" of="" effect="" in="" situations="" where="" direct="" measurement="" of="" clinical="" effect="" is="" not="" feasible="" or="" practical.="" treatment="" effect--an="" effect="" attributed="" to="" a="" treatment="" in="" a="" clinical="" trial.="" in="" most="" clinical="" trials,="" the="" treatment="" effect="" of="" interest="" is="" a="" comparison="" (or="" contrast)="" of="" two="" or="" more="" treatments.="" treatment="" emergent--an="" event="" that="" emerges="" during="" treatment,="" having="" been="" absent="" pretreatment,="" or="" worsens="" relative="" to="" the="" pretreatment="" state.="" dated:="" april="" 30,="" 1997.="" william="" k.="" hubbard,="" associate="" commissioner="" for="" policy="" coordination.="" [fr="" doc.="" 97-12139="" filed="" 5-8-97;="" 8:45="" am]="" billing="" code="" 4160-01-f="">0.05'').>