[Federal Register Volume 63, Number 179 (Wednesday, September 16, 1998)]
[Notices]
[Pages 49583-49598]
From the Federal Register Online via the Government Publishing Office [www.gpo.gov]
[FR Doc No: 98-24754]
-----------------------------------------------------------------------
DEPARTMENT OF HEALTH AND HUMAN SERVICES
Food and Drug Administration
[Docket No. 97D-0174]
International Conference on Harmonisation; Guidance on
Statistical Principles for Clinical Trials; Availability
AGENCY: Food and Drug Administration, HHS.
ACTION: Notice.
-----------------------------------------------------------------------
SUMMARY: The Food and Drug Administration (FDA) is publishing a
guidance entitled ``E9 Statistical Principles for Clinical Trials.''
The guidance was prepared under the auspices of the International
Conference on Harmonisation of Technical Requirements for Registration
of Pharmaceuticals for Human Use (ICH). The guidance is intended to
provide recommendations to sponsors and scientific experts regarding
statistical principles and methodology which, when applied to clinical
trials for marketing applications, will facilitate the general
acceptance of analyses and conclusions drawn from the trials.
DATES: Effective September 16, 1998. Submit written comments at any
time.
ADDRESSES: Submit written comments on the guidance to the Dockets
Management Branch (HFA-305), Food and Drug Administration, 5630 Fishers
Lane, rm. 1061, Rockville, MD 20852. Copies of the guidance are
available from the Drug Information Branch (HFD-210), Center for Drug
Evaluation and Research, Food and Drug Administration, 5600 Fishers
Lane, Rockville, MD 20857, 301-827-4573. Single copies of the guidance
may be obtained by mail from the Office of Communication, Training and
Manufacturers Assistance (HFM-40), Center for Biologics Evaluation and
Research (CBER), 1401 Rockville Pike, Rockville, MD 20852-1448, or by
calling the CBER Voice Information System at 1-800-835-4709 or 301-827-
1800. Copies may be obtained from CBER's FAX Information System at 1-
888-CBER-FAX or 301-827-3844.
FOR FURTHER INFORMATION CONTACT:
Regarding the guidance: Robert O'Neill, Center for Drug Evaluation
and Research (HFD-700), Food and Drug Administration, 5600 Fishers
Lane, Rockville, MD 20857, 301-827-3195.
Regarding the ICH: Janet J. Showalter, Office of Health Affairs
(HFY-20), Food and Drug Administration, 5600 Fishers Lane, Rockville,
MD 20857, 301-827-0864.
SUPPLEMENTARY INFORMATION: In recent years, many important initiatives
have been undertaken by regulatory authorities and industry
associations to promote international harmonization of regulatory
requirements. FDA has participated in many meetings designed to enhance
harmonization and is committed to seeking scientifically based
harmonized technical procedures for pharmaceutical development. One of
the goals of harmonization is to identify and then reduce differences
in technical requirements for drug development among regulatory
agencies.
ICH was organized to provide an opportunity for tripartite
harmonization initiatives to be developed with input from both
regulatory and industry representatives. FDA also seeks input from
consumer representatives and others. ICH is concerned with
harmonization of technical requirements for the registration of
pharmaceutical products among three regions: The European Union, Japan,
and the United States. The six ICH sponsors are: The European
Commission, the European Federation of Pharmaceutical Industries
Associations, the Japanese Ministry of Health and Welfare, the Japanese
Pharmaceutical Manufacturers Association, the Centers for Drug
Evaluation and Research and Biologics Evaluation and Research, FDA, and
the Pharmaceutical Research and Manufacturers of America. The ICH
Secretariat, which coordinates the preparation of documentation, is
provided by the International Federation of Pharmaceutical
Manufacturers Associations (IFPMA).
The ICH Steering Committee includes representatives from each of
the ICH sponsors and the IFPMA, as well as observers from the World
Health Organization, the Canadian Health Protection Branch, and the
European Free Trade Area.
In the Federal Register of May 9, 1997 (62 FR 25712), FDA published
a draft tripartite guideline entitled ``Statistical Principles for
Clinical Trials'' (E9). The notice gave interested persons an
opportunity to submit comments by June 23, 1997.
After consideration of the comments received and revisions to the
guidance, a final draft of the guidance was submitted to the ICH
Steering Committee and endorsed by the three participating regulatory
agencies on February 5, 1998.
In accordance with FDA's Good Guidance Practices (62 FR 8961,
February 27, 1997), this document has been designated a guidance,
rather than a guideline.
The guidance addresses principles of statistical methodology
applied to clinical trials for marketing applications. The guidance
provides recommendations to sponsors for the design, conduct, analysis,
and evaluation of clinical trials of an investigational product in the
context of its overall clinical development. The document also provides
guidance to scientific experts in preparing application summaries or
assessing evidence of efficacy and safety, principally from late Phase
II and Phase III clinical trials. Application of the principles of
statistical methodology is intended to facilitate the general
acceptance of analyses and conclusions drawn from clinical trials.
This guidance represents the agency's current thinking on
statistical principles for clinical trials of drugs and biologics.
[[Page 49584]]
It does not create or confer any rights for, or on, any person and
does not operate to bind FDA or the public. An alternative approach may
be used if such approach satisfies the requirements of the applicable
statute, regulations, or both.
As with all of FDA's guidances, the public is encouraged to submit
written comments with new data or other new information pertinent to
this guidance. The comments in the docket will be periodically
reviewed, and, where appropriate, the guidance will be amended. The
public will be notified of any such amendments through a notice in the
Federal Register.
Interested persons may, at any time, submit written comments on the
guidance to the Dockets Management Branch (address above). Two copies
of any comments are to be submitted, except that individuals may submit
one copy. Comments are to be identified with the docket number found in
brackets in the heading of this document. The guidance and received
comments may be seen in the office above between 9 a.m. and 4 p.m.,
Monday through Friday. An electronic version of this guidance is
available on the Internet at ``http://www.fda.gov/cder/guidance/
index.htm'' or at CBER's World Wide Web site at ``http://www.fda.gov/
cber/publications.htm''.
The text of the guidance follows:
E9 Statistical Principles for Clinical Trials \1\
---------------------------------------------------------------------------
\1\ This guidance represents the agency's current thinking on
statistical principles for clinical trials of drugs and biologics.
It does not create or confer any rights for or on any person and
does not operate to bind FDA or the public. An alternative approach
may be used if such approach satisfies the requirements of the
applicable statute, regulations, or both.
---------------------------------------------------------------------------
Note: A glossary of terms and definitions is provided as an
annex to this guidance.
I. Introduction
1.1 Background and Purpose
1.2 Scope and Direction
II. Considerations for Overall Clinical Development
2.1 Trial Context
2.1.1 Development Plan
2.1.2 Confirmatory Trial
2.1.3 Exploratory Trial
2.2 Scope of Trials
2.2.1 Population
2.2.2 Primary and Secondary Variables
2.2.3 Composite Variables
2.2.4 Global Assessment Variables
2.2.5 Multiple Primary Variables
2.2.6 Surrogate Variables
2.2.7 Categorized Variables
2.3 Design Techniques to Avoid Bias
2.3.1 Blinding
2.3.2 Randomization
III. Trial Design Considerations
3.1 Design Configuration
3.1.1 Parallel Group Design
3.1.2 Crossover Design
3.1.3 Factorial Designs
3.2 Multicenter Trials
3.3 Type of Comparison
3.3.1 Trials to Show Superiority
3.3.2 Trials to Show Equivalence or Noninferiority
3.3.3 Trials to Show Dose-Response Relationship
3.4 Group Sequential Designs
3.5 Sample Size
3.6 Data Capture and Processing
IV. Trial Conduct Considerations
4.1 Trial Monitoring and Interim Analysis
4.2 Changes in Inclusion and Exclusion Criteria
4.3 Accrual Rates
4.4 Sample Size Adjustment
4.5 Interim Analysis and Early Stopping
4.6 Role of Independent Data Monitoring Committee (IDMC)
V. Data Analysis Considerations
5.1 Prespecification of the Analysis
5.2 Analysis Sets
5.2.1 Full Analysis Set
5.2.2 Per Protocol Set
5.2.3 Roles of the Different Analysis Sets
5.3 Missing Values and Outliers
5.4 Data Transformation
5.5 Estimation, Confidence Intervals, and Hypothesis Testing
5.6 Adjustment of Significance and Confidence Levels
5.7 Subgroups, Interactions, and Covariates
5.8 Integrity of Data and Computer Software Validity
VI. Evaluation of Safety and Tolerability
6.1 Scope of Evaluation
6.2 Choice of Variables and Data Collection
6.3 Set of Subjects to be Evaluated and Presentation of Data
6.4 Statistical Evaluation
6.5 Integrated Summary
VII. Reporting
7.1 Evaluation and Reporting
7.2 Summarizing the Clinical Database
7.2.1 Efficacy Data
7.2.2 Safety Data
Annex 1 Glossary
I. Introduction
1.1 Background and Purpose
The efficacy and safety of medicinal products should be
demonstrated by clinical trials that follow the guidance in ``Good
Clinical Practice: Consolidated Guideline'' (ICH E6) adopted by the
ICH, May 1, 1996. The role of statistics in clinical trial design
and analysis is acknowledged as essential in that ICH guideline. The
proliferation of statistical research in the area of clinical trials
coupled with the critical role of clinical research in the drug
approval process and health care in general necessitate a succinct
document on statistical issues related to clinical trials. This
guidance is written primarily to attempt to harmonize the principles
of statistical methodology applied to clinical trials for marketing
applications submitted in Europe, Japan, and the United States.
As a starting point, this guidance utilized the CPMP (Committee
for Proprietary Medicinal Products) Note for Guidance entitled
``Biostatistical Methodology in Clinical Trials in Applications for
Marketing Authorizations for Medicinal Products'' (December, 1994).
It was also influenced by ``Guidelines on the Statistical Analysis
of Clinical Studies'' (March 1992) from the Japanese Ministry of
Health and Welfare and the U.S. Food and Drug Administration
document entitled ``Guideline for the Format and Content of the
Clinical and Statistical Sections of a New Drug Application'' (July
1988). Some topics related to statistical principles and methodology
are also embedded within other ICH guidances, particularly those
listed below. The specific guidance that contains related text will
be identified in various sections of this document.
E1A: The Extent of Population Exposure to Assess Clinical Safety
E2A: Clinical Safety Data Management: Definitions and Standards
for Expedited Reporting
E2B: Clinical Safety Data Management: Data Elements for
Transmission of Individual Case Safety Reports
E2C: Clinical Safety Data Management: Periodic Safety Update
Reports for Marketed Drugs
E3: Structure and Content of Clinical Study Reports
E4: Dose-Response Information to Support Drug Registration
E5: Ethnic Factors in the Acceptability of Foreign Clinical Data
E6: Good Clinical Practice: Consolidated Guideline
E7: Studies in Support of Special Populations: Geriatrics
E8: General Considerations for Clinical Trials
E10: Choice of Control Group in Clinical Trials
M1: Standardization of Medical Terminology for Regulatory
Purposes
M3: Nonclinical Safety Studies for the Conduct of Human Clinical
Trials for Pharmaceuticals.
This guidance is intended to give direction to sponsors in the
design, conduct, analysis, and evaluation of clinical trials of an
investigational product in the context of its overall clinical
development. The document will also assist scientific experts
charged with preparing application summaries or assessing evidence
of efficacy and safety, principally from clinical trials in later
phases of development.
1.2 Scope and Direction
The focus of this guidance is on statistical principles. It does
not address the use of specific statistical procedures or methods.
Specific procedural steps to ensure that principles are implemented
properly are the responsibility of the sponsor. Integration of data
across clinical trials is discussed, but is not a primary focus of
this guidance. Selected principles and procedures related to data
management or clinical trial monitoring activities are covered in
other ICH guidances and are not addressed here.
This guidance should be of interest to individuals from a broad
range of scientific disciplines. However, it is assumed that the
actual responsibility for all statistical work associated with
clinical trials will lie with an appropriately qualified and
experienced statistician, as indicated in ICH E6. The role
[[Page 49585]]
and responsibility of the trial statistician (see Glossary), in
collaboration with other clinical trial professionals, is to ensure
that statistical principles are applied appropriately in clinical
trials supporting drug development. Thus, the trial statistician
should have a combination of education/training and experience
sufficient to implement the principles articulated in this guidance.
For each clinical trial contributing to a marketing application,
all important details of its design and conduct and the principal
features of its proposed statistical analysis should be clearly
specified in a protocol written before the trial begins. The extent
to which the procedures in the protocol are followed and the primary
analysis is planned a priori will contribute to the degree of
confidence in the final results and conclusions of the trial. The
protocol and subsequent amendments should be approved by the
responsible personnel, including the trial statistician. The trial
statistician should ensure that the protocol and any amendments
cover all relevant statistical issues clearly and accurately, using
technical terminology as appropriate.
The principles outlined in this guidance are primarily relevant
to clinical trials conducted in the later phases of development,
many of which are confirmatory trials of efficacy. In addition to
efficacy, confirmatory trials may have as their primary variable a
safety variable (e.g., an adverse event, a clinical laboratory
variable, or an electrocardiographic measure) or a pharmacodynamic
or pharmacokinetic variable (as in a confirmatory bioequivalence
trial). Furthermore, some confirmatory findings may be derived from
data integrated across trials, and selected principles in this
guidance are applicable in this situation. Finally, although the
early phases of drug development consist mainly of clinical trials
that are exploratory in nature, statistical principles are also
relevant to these clinical trials. Hence, the substance of this
document should be applied as far as possible to all phases of
clinical development.
Many of the principles delineated in this guidance deal with
minimizing bias (see Glossary) and maximizing precision. As used in
this guidance, the term ``bias'' describes the systematic tendency
of any factors associated with the design, conduct, analysis, and
interpretation of the results of clinical trials to make the
estimate of a treatment effect (see Glossary) deviate from its true
value. It is important to identify potential sources of bias as
completely as possible so that attempts to limit such bias may be
made. The presence of bias may seriously compromise the ability to
draw valid conclusions from clinical trials.
Some sources of bias arise from the design of the trial, for
example an assignment of treatments such that subjects at lower risk
are systematically assigned to one treatment. Other sources of bias
arise during the conduct and analysis of a clinical trial. For
example, protocol violations and exclusion of subjects from analysis
based upon knowledge of subject outcomes are possible sources of
bias that may affect the accurate assessment of the treatment
effect. Because bias can occur in subtle or unknown ways and its
effect is not measurable directly, it is important to evaluate the
robustness of the results and primary conclusions of the trial.
Robustness is a concept that refers to the sensitivity of the
overall conclusions to various limitations of the data, assumptions,
and analytic approaches to data analysis. Robustness implies that
the treatment effect and primary conclusions of the trial are not
substantially affected when analyses are carried out based on
alternative assumptions or analytic approaches. The interpretation
of statistical measures of uncertainty of the treatment effect and
treatment comparisons should involve consideration of the potential
contribution of bias to the p-value, confidence interval, or
inference.
Because the predominant approaches to the design and analysis of
clinical trials have been based on frequentist statistical methods,
the guidance largely refers to the use of frequentist methods (see
Glossary) when discussing hypothesis testing and/or confidence
intervals. This should not be taken to imply that other approaches
are not appropriate; the use of Bayesian (see Glossary) and other
approaches may be considered when the reasons for their use are
clear and when the resulting conclusions are sufficiently robust.
II. Considerations for Overall Clinical Development
2.1 Trial Context
2.1.1 Development Plan
The broad aim of the process of clinical development of a new
drug is to find out whether there is a dose range and schedule at
which the drug can be shown to be simultaneously safe and effective,
to the extent that the risk-benefit relationship is acceptable. The
particular subjects who may benefit from the drug, and the specific
indications for its use, also need to be defined.
Satisfying these broad aims usually requires an ordered program
of clinical trials, each with its own specific objectives (see ICH
E8). This should be specified in a clinical plan, or a series of
plans, with appropriate decision points and flexibility to allow
modification as knowledge accumulates. A marketing application
should clearly describe the main content of such plans, and the
contribution made by each trial. Interpretation and assessment of
the evidence from the total program of trials involves synthesis of
the evidence from the individual trials (see section 7.2). This is
facilitated by ensuring that common standards are adopted for a
number of features of the trials, such as dictionaries of medical
terms, definition and timing of the main measurements, handling of
protocol deviations, and so on. A statistical summary, overview, or
meta-analysis (see Glossary) may be informative when medical
questions are addressed in more than one trial. Where possible, this
should be envisaged in the plan so that the relevant trials are
clearly identified and any necessary common features of their
designs are specified in advance. Other major statistical issues (if
any) that are expected to affect a number of trials in a common plan
should be addressed in that plan.
2.1.2 Confirmatory Trial
A confirmatory trial is an adequately controlled trial in which
the hypotheses are stated in advance and evaluated. As a rule,
confirmatory trials are necessary to provide firm evidence of
efficacy or safety. In such trials the key hypothesis of interest
follows directly from the trial's primary objective, is always
predefined, and is the hypothesis that is subsequently tested when
the trial is complete. In a confirmatory trial, it is equally
important to estimate with due precision the size of the effects
attributable to the treatment of interest and to relate these
effects to their clinical significance.
Confirmatory trials are intended to provide firm evidence in
support of claims; hence adherence to protocols and standard
operating procedures is particularly important. Unavoidable changes
should be explained and documented, and their effect examined. A
justification of the design of each such trial and of other
important statistical aspects, such as the principal features of the
planned analysis, should be set out in the protocol. Each trial
should address only a limited number of questions.
Firm evidence in support of claims requires that the results of
the confirmatory trials demonstrate that the investigational product
under test has clinical benefits. The confirmatory trials should
therefore be sufficient to answer each key clinical question
relevant to the efficacy or safety claim clearly and definitively.
In addition, it is important that the basis for generalization (see
Glossary) to the intended patient population is understood and
explained; this may also influence the number and type (e.g.,
specialist or general practitioner) of centers and/or trials needed.
The results of the confirmatory trial(s) should be robust. In some
circumstances, the weight of evidence from a single confirmatory
trial may be sufficient.
2.1.3 Exploratory Trial
The rationale and design of confirmatory trials nearly always
rests on earlier clinical work carried out in a series of
exploratory studies. Like all clinical trials, these exploratory
studies should have clear and precise objectives. However, in
contrast to confirmatory trials, their objectives may not always
lead to simple tests of predefined hypotheses. In addition,
exploratory trials may sometimes require a more flexible approach to
design so that changes can be made in response to accumulating
results. Their analysis may entail data exploration. Tests of
hypothesis may be carried out, but the choice of hypothesis may be
data dependent. Such trials cannot be the basis of the formal proof
of efficacy, although they may contribute to the total body of
relevant evidence.
Any individual trial may have both confirmatory and exploratory
aspects. For example, in most confirmatory trials the data are also
subjected to exploratory analyses which serve as a basis for
explaining or supporting their findings and for suggesting further
hypotheses for later research. The protocol should make a clear
distinction between the aspects of a trial which will be used for
confirmatory proof and the aspects
[[Page 49586]]
which will provide data for exploratory analysis.
2.2 Scope of Trials
2.2.1 Population
In the earlier phases of drug development, the choice of
subjects for a clinical trial may be heavily influenced by the wish
to maximize the chance of observing specific clinical effects of
interest. Hence they may come from a very narrow subgroup of the
total patient population for which the drug may eventually be
indicated. However, by the time the confirmatory trials are
undertaken, the subjects in the trials should more closely mirror
the target population. In these trials, it is generally helpful to
relax the inclusion and exclusion criteria as much as possible
within the target population while maintaining sufficient
homogeneity to permit precise estimation of treatment effects. No
individual clinical trial can be expected to be totally
representative of future users because of the possible influences of
geographical location, the time when it is conducted, the medical
practices of the particular investigator(s) and clinics, and so on.
However, the influence of such factors should be reduced wherever
possible and subsequently discussed during the interpretation of the
trial results.
2.2.2 Primary and Secondary Variables
The primary variable (``target'' variable, primary endpoint)
should be the variable capable of providing the most clinically
relevant and convincing evidence directly related to the primary
objective of the trial. There should generally be only one primary
variable. This will usually be an efficacy variable, because the
primary objective of most confirmatory trials is to provide strong
scientific evidence regarding efficacy. Safety/tolerability may
sometimes be the primary variable, and will always be an important
consideration. Measurements relating to quality of life and health
economics are further potential primary variables. The selection of
the primary variable should reflect the accepted norms and standards
in the relevant field of research. The use of a reliable and
validated variable with which experience has been gained either in
earlier studies or in published literature is recommended. There
should be sufficient evidence that the primary variable can provide
a valid and reliable measure of some clinically relevant and
important treatment benefit in the patient population described by
the inclusion and exclusion criteria. The primary variable should
generally be the one used when estimating the sample size (see
section 3.5).
In many cases, the approach to assessing subject outcome may not
be straightforward and should be carefully defined. For example, it
is inadequate to specify mortality as a primary variable without
further clarification; mortality may be assessed by comparing
proportions alive at fixed points in time or by comparing overall
distributions of survival times over a specified interval. Another
common example is a recurring event; the measure of treatment effect
may again be a simple dichotomous variable (any occurrence during a
specified interval), time to first occurrence, rate of occurrence
(events per time units of observation), and so on. The assessment of
functional status over time in studying treatment for chronic
disease presents other challenges in selection of the primary
variable. There are many possible approaches, such as comparisons of
the assessments done at the beginning and end of the interval of
observation, comparisons of slopes calculated from all assessments
throughout the interval, comparisons of the proportions of subjects
exceeding or declining beyond a specified threshold, or comparisons
based on methods for repeated measures data. To avoid multiplicity
concerns arising from post hoc definitions, it is critical to
specify in the protocol the precise definition of the primary
variable as it will be used in the statistical analysis. In
addition, the clinical relevance of the specific primary variable
selected and the validity of the associated measurement procedures
will generally need to be addressed and justified in the protocol.
The primary variable should be specified in the protocol, along
with the rationale for its selection. Redefinition of the primary
variable after unblinding will almost always be unacceptable, since
the biases this introduces are difficult to assess. When the
clinical effect defined by the primary objective is to be measured
in more than one way, the protocol should identify one of the
measurements as the primary variable on the basis of clinical
relevance, importance, objectivity, and/or other relevant
characteristics, whenever such selection is feasible.
Secondary variables are either supportive measurements related
to the primary objective or measurements of effects related to the
secondary objectives. Their predefinition in the protocol is also
important, as well as an explanation of their relative importance
and roles in interpretation of trial results. The number of
secondary variables should be limited and should be related to the
limited number of questions to be answered in the trial.
2.2.3 Composite Variables
If a single primary variable cannot be selected from multiple
measurements associated with the primary objective, another useful
strategy is to integrate or combine the multiple measurements into a
single or ``composite'' variable, using a predefined algorithm.
Indeed, the primary variable sometimes arises as a combination of
multiple clinical measurements (e.g., the rating scales used in
arthritis, psychiatric disorders, and elsewhere). This approach
addresses the multiplicity problem without requiring adjustment to
the Type I error. The method of combining the multiple measurements
should be specified in the protocol, and an interpretation of the
resulting scale should be provided in terms of the size of a
clinically relevant benefit. When a composite variable is used as a
primary variable, the components of this variable may sometimes be
analyzed separately, where clinically meaningful and validated. When
a rating scale is used as a primary variable, it is especially
important to address factors such as content validity (see
Glossary), inter- and intrarater reliability (see Glossary), and
responsiveness for detecting changes in the severity of disease.
2.2.4 Global Assessment Variables
In some cases, ``global assessment'' variables (see Glossary)
are developed to measure the overall safety, overall efficacy, and/
or overall usefulness of a treatment. This type of variable
integrates objective variables and the investigator's overall
impression about the state or change in the state of the subject,
and is usually a scale of ordered categorical ratings. Global
assessments of overall efficacy are well established in some
therapeutic areas, such as neurology and psychiatry.
Global assessment variables generally have a subjective
component. When a global assessment variable is used as a primary or
secondary variable, fuller details of the scale should be included
in the protocol with respect to:
(1) The relevance of the scale to the primary objective of the
trial;
(2) The basis for the validity and reliability of the scale;
(3) How to utilize the data collected on an individual subject
to assign him/her to a unique category of the scale;
(4) How to assign subjects with missing data to a unique
category of the scale, or otherwise evaluate them.
If objective variables are considered by the investigator when
making a global assessment, then those objective variables should be
considered as additional primary or, at least, important secondary
variables.
Global assessment of usefulness integrates components of both
benefit and risk and reflects the decisionmaking process of the
treating physician, who must weigh benefit and risk in making
product use decisions. A problem with global usefulness variables is
that their use could in some cases lead to the result of two
products being declared equivalent despite having very different
profiles of beneficial and adverse effects. For example, judging the
global usefulness of a treatment as equivalent or superior to an
alternative may mask the fact that it has little or no efficacy but
fewer adverse effects. Therefore, it is not advisable to use a
global usefulness variable as a primary variable. If global
usefulness is specified as primary, it is important to consider
specific efficacy and safety outcomes separately as additional
primary variables.
2.2.5 Multiple Primary Variables
It may sometimes be desirable to use more than one primary
variable, each of which (or a subset of which) could be sufficient
to cover the range of effects of the therapies. The planned manner
of interpretation of this type of evidence should be carefully
spelled out. It should be clear whether an impact on any of the
variables, some minimum number of them, or all of them, would be
considered necessary to achieve the trial objectives. The primary
hypothesis or hypotheses and parameters of interest (e.g., mean,
percentage, distribution) should be clearly stated with respect to
the primary variables identified, and the approach to statistical
inference described. The effect on the Type I error should be
explained because of the potential for multiplicity problems (see
section 5.6);
[[Page 49587]]
the method of controlling Type I error should be given in the
protocol. The extent of intercorrelation among the proposed primary
variables may be considered in evaluating the impact on Type I
error. If the purpose of the trial is to demonstrate effects on all
of the designated primary variables, then there is no need for
adjustment of the Type I error, but the impact on Type II error and
sample size should be carefully considered.
2.2.6 Surrogate Variables
When direct assessment of the clinical benefit to the subject
through observing actual clinical efficacy is not practical,
indirect criteria (surrogate variables--see Glossary) may be
considered. Commonly accepted surrogate variables are used in a
number of indications where they are believed to be reliable
predictors of clinical benefit. There are two principal concerns
with the introduction of any proposed surrogate variable. First, it
may not be a true predictor of the clinical outcome of interest. For
example, it may measure treatment activity associated with one
specific pharmacological mechanism, but may not provide full
information on the range of actions and ultimate effects of the
treatment, whether positive or negative. There have been many
instances where treatments showing a highly positive effect on a
proposed surrogate have ultimately been shown to be detrimental to
the subjects' clinical outcome; conversely, there are cases of
treatments conferring clinical benefit without measurable impact on
proposed surrogates. Secondly, proposed surrogate variables may not
yield a quantitative measure of clinical benefit that can be weighed
directly against adverse effects. Statistical criteria for
validating surrogate variables have been proposed but the experience
with their use is relatively limited. In practice, the strength of
the evidence for surrogacy depends upon (i) the biological
plausibility of the relationship, (ii) the demonstration in
epidemiological studies of the prognostic value of the surrogate for
the clinical outcome, and (iii) evidence from clinical trials that
treatment effects on the surrogate correspond to effects on the
clinical outcome. Relationships between clinical and surrogate
variables for one product do not necessarily apply to a product with
a different mode of action for treating the same disease.
2.2.7 Categorized Variables
Dichotomization or other categorization of continuous or ordinal
variables may sometimes be desirable. Criteria of ``success'' and
``response'' are common examples of dichotomies that should be
specified precisely in terms of, for example, a minimum percentage
improvement (relative to baseline) in a continuous variable or a
ranking categorized as at or above some threshold level (e.g.,
``good'') on an ordinal rating scale. The reduction of diastolic
blood pressure below 90 mmHg is a common dichotomization.
Categorizations are most useful when they have clear clinical
relevance. The criteria for categorization should be predefined and
specified in the protocol, as knowledge of trial results could
easily bias the choice of such criteria. Because categorization
normally implies a loss of information, a consequence will be a loss
of power in the analysis; this should be accounted for in the sample
size calculation.
2.3 Design Techniques to Avoid Bias
The most important design techniques for avoiding bias in
clinical trials are blinding and randomization, and these should be
normal features of most controlled clinical trials intended to be
included in a marketing application. Most such trials follow a
double-blind approach in which treatments are prepacked in
accordance with a suitable randomization schedule, and supplied to
the trial center(s) labeled only with the subject number and the
treatment period, so that no one involved in the conduct of the
trial is aware of the specific treatment allocated to any particular
subject, not even as a code letter. This approach will be assumed in
section 2.3.1 and most of section 2.3.2, exceptions being considered
at the end.
Bias can also be reduced at the design stage by specifying
procedures in the protocol aimed at minimizing any anticipated
irregularities in trial conduct that might impair a satisfactory
analysis, including various types of protocol violations,
withdrawals and missing values. The protocol should consider ways
both to reduce the frequency of such problems and to handle the
problems that do occur in the analysis of data.
2.3.1 Blinding
Blinding or masking is intended to limit the occurrence of
conscious and unconscious bias in the conduct and interpretation of
a clinical trial arising from the influence that the knowledge of
treatment may have on the recruitment and allocation of subjects,
their subsequent care, the attitudes of subjects to the treatments,
the assessment of end-points, the handling of withdrawals, the
exclusion of data from analysis, and so on. The essential aim is to
prevent identification of the treatments until all such
opportunities for bias have passed.
A double-blind trial is one in which neither the subject nor any
of the investigator or sponsor staff involved in the treatment or
clinical evaluation of the subjects are aware of the treatment
received. This includes anyone determining subject eligibility,
evaluating endpoints, or assessing compliance with the protocol.
This level of blinding is maintained throughout the conduct of the
trial, and only when the data are cleaned to an acceptable level of
quality will appropriate personnel be unblinded. If any of the
sponsor staff who are not involved in the treatment or clinical
evaluation of the subjects are required to be unblinded to the
treatment code (e.g., bioanalytical scientists, auditors, those
involved in serious adverse event reporting), the sponsor should
have adequate standard operating procedures to guard against
inappropriate dissemination of treatment codes. In a single-blind
trial the investigator and/or his staff are aware of the treatment
but the subject is not, or vice versa. In an open-label trial the
identity of treatment is known to all. The double-blind trial is the
optimal approach. This requires that the treatments to be applied
during the trial cannot be distinguished (by appearance, taste,
etc.) either before or during administration, and that the blind is
maintained appropriately during the whole trial.
Difficulties in achieving the double-blind ideal can arise: The
treatments may be of a completely different nature, for example,
surgery and drug therapy; two drugs may have different formulations
and, although they could be made indistinguishable by the use of
capsules, changing the formulation might also change the
pharmacokinetic and/or pharmacodynamic properties and hence
necessitate that bioequivalence of the formulations be established;
the daily pattern of administration of two treatments may differ.
One way of achieving double-blind conditions under these
circumstances is to use a ``double-dummy'' (see Glossary) technique.
This technique may sometimes force an administration scheme that is
sufficiently unusual to influence adversely the motivation and
compliance of the subjects. Ethical difficulties may also interfere
with its use when, for example, it entails dummy operative
procedures. Nevertheless, extensive efforts should be made to
overcome these difficulties.
The double-blind nature of some clinical trials may be partially
compromised by apparent treatment induced effects. In such cases,
blinding may be improved by blinding investigators and relevant
sponsor staff to certain test results (e.g., selected clinical
laboratory measures). Similar approaches (see below) to minimizing
bias in open-label trials should be considered in trials where
unique or specific treatment effects may lead to unblinding
individual patients.
If a double-blind trial is not feasible, then the single-blind
option should be considered. In some cases only an open-label trial
is practically or ethically possible. Single-blind and open-label
trials provide additional flexibility, but it is particularly
important that the investigator's knowledge of the next treatment
should not influence the decision to enter the subject; this
decision should precede knowledge of the randomized treatment. For
these trials, consideration should be given to the use of a
centralized randomization method, such as telephone randomization,
to administer the assignment of randomized treatment. In addition,
clinical assessments should be made by medical staff who are not
involved in treating the subjects and who remain blind to treatment.
In single-blind or open-label trials every effort should be made to
minimize the various known sources of bias and primary variables
should be as objective as possible. The reasons for the degree of
blinding adopted, as well as steps taken to minimize bias by other
means, should be explained in the protocol. For example, the sponsor
should have adequate standard operating procedures to ensure that
access to the treatment code is appropriately restricted during the
process of cleaning the database prior to its release for analysis.
Breaking the blind (for a single subject) should be considered
only when knowledge of the treatment assignment is deemed essential
by the subject's physician for the subject's care. Any intentional
or unintentional breaking of the blind should be reported and
explained at the end of the trial,
[[Page 49588]]
irrespective of the reason for its occurrence. The procedure and
timing for revealing the treatment assignments should be documented.
In this document, the blind review (see Glossary) of data refers
to the checking of data during the period of time between trial
completion (the last observation on the last subject) and the
breaking of the blind.
2.3.2 Randomization
Randomization introduces a deliberate element of chance into the
assignment of treatments to subjects in a clinical trial. During
subsequent analysis of the trial data, it provides a sound
statistical basis for the quantitative evaluation of the evidence
relating to treatment effects. It also tends to produce treatment
groups in which the distributions of prognostic factors, known and
unknown, are similar. In combination with blinding, randomization
helps to avoid possible bias in the selection and allocation of
subjects arising from the predictability of treatment assignments.
The randomization schedule of a clinical trial documents the
random allocation of treatments to subjects. In the simplest
situation it is a sequential list of treatments (or treatment
sequences in a crossover trial) or corresponding codes by subject
number. The logistics of some trials, such as those with a screening
phase, may make matters more complicated, but the unique preplanned
assignment of treatment, or treatment sequence, to subject should be
clear. Different trial designs will necessitate different procedures
for generating randomization schedules. The randomization schedule
should be reproducible (if the need arises).
Although unrestricted randomization is an acceptable approach,
some advantages can generally be gained by randomizing subjects in
blocks. This helps to increase the comparability of the treatment
groups, particularly when subject characteristics may change over
time, as a result, for example, of changes in recruitment policy. It
also provides a better guarantee that the treatment groups will be
of nearly equal size. In crossover trials, it provides the means of
obtaining balanced designs with their greater efficiency and easier
interpretation. Care should be taken to choose block lengths that
are sufficiently short to limit possible imbalance, but that are
long enough to avoid predictability towards the end of the sequence
in a block. Investigators and other relevant staff should generally
be blind to the block length; the use of two or more block lengths,
randomly selected for each block, can achieve the same purpose.
(Theoretically, in a double-blind trial predictability does not
matter, but the pharmacological effects of drugs may provide the
opportunity for intelligent guesswork.)
In multicenter trials (see Glossary), the randomization
procedures should be organized centrally. It is advisable to have a
separate random scheme for each center, i.e., to stratify by center
or to allocate several whole blocks to each center. More generally,
stratification by important prognostic factors measured at baseline
(e.g., severity of disease, age, sex) may sometimes be valuable in
order to promote balanced allocation within strata; this has greater
potential benefit in small trials. The use of more than two or three
stratification factors is rarely necessary, is less successful at
achieving balance, and is logistically troublesome. The use of a
dynamic allocation procedure (see below) may help to achieve balance
across a number of stratification factors simultaneously, provided
the rest of the trial procedures can be adjusted to accommodate an
approach of this type. Factors on which randomization has been
stratified should be accounted for later in the analysis.
The next subject to be randomized into a trial should always
receive the treatment corresponding to the next free number in the
appropriate randomization schedule (in the respective stratum, if
randomization is stratified). The appropriate number and associated
treatment for the next subject should only be allocated when entry
of that subject to the randomized part of the trial has been
confirmed. Details of the randomization that facilitate
predictability (e.g., block length) should not be contained in the
trial protocol. The randomization schedule itself should be filed
securely by the sponsor or an independent party in a manner that
ensures that blindness is properly maintained throughout the trial.
Access to the randomization schedule during the trial should take
into account the possibility that, in an emergency, the blind may
have to be broken for any subject. The procedure to be followed, the
necessary documentation, and the subsequent treatment and assessment
of the subject should all be described in the protocol.
Dynamic allocation is an alternative procedure in which the
allocation of treatment to a subject is influenced by the current
balance of allocated treatments and, in a stratified trial, by the
stratum to which the subject belongs and the balance within that
stratum. Deterministic dynamic allocation procedures should be
avoided and an appropriate element of randomization should be
incorporated for each treatment allocation. Every effort should be
made to retain the double-blind status of the trial. For example,
knowledge of the treatment code may be restricted to a central trial
office from where the dynamic allocation is controlled, generally
through telephone contact. This in turn permits additional checks of
eligibility criteria and establishes entry into the trial, features
that can be valuable in certain types of multicenter trials. The
usual system of prepacking and labeling drug supplies for double-
blind trials can then be followed, but the order of their use is no
longer sequential. It is desirable to use appropriate computer
algorithms to keep personnel at the central trial office blind to
the treatment code. The complexity of the logistics and potential
impact on the analysis should be carefully evaluated when
considering dynamic allocation.
III. Trial Design Considerations
3.1 Design Configuration
3.1.1 Parallel Group Design
The most common clinical trial design for confirmatory trials is
the parallel group design in which subjects are randomized to one of
two or more arms, each arm being allocated a different treatment.
These treatments will include the investigational product at one or
more doses, and one or more control treatments, such as placebo and/
or an active comparator. The assumptions underlying this design are
less complex than for most other designs. However, as with other
designs, there may be additional features of the trial that
complicate the analysis and interpretation (e.g., covariates,
repeated measurements over time, interactions between design
factors, protocol violations, dropouts (see Glossary), and
withdrawals).
3.1.2 Crossover Design
In the crossover design, each subject is randomized to a
sequence of two or more treatments and hence acts as his own control
for treatment comparisons. This simple maneuver is attractive
primarily because it reduces the number of subjects and usually the
number of assessments needed to achieve a specific power, sometimes
to a marked extent. In the simplest 2 2 crossover design, each
subject receives each of two treatments in randomized order in two
successive treatment periods, often separated by a washout period.
The most common extension of this entails comparing n(2)
treatments in n periods, each subject receiving all n treatments.
Numerous variations exist, such as designs in which each subject
receives a subset of n(2) treatments, or designs in which
treatments are repeated within a subject.
Crossover designs have a number of problems that can invalidate
their results. The chief difficulty concerns carryover, that is, the
residual influence of treatments in subsequent treatment periods. In
an additive model, the effect of unequal carryover will be to bias
direct treatment comparisons. In the 2 2 design, the carryover
effect cannot be statistically distinguished from the interaction
between treatment and period and the test for either of these
effects lacks power because the corresponding contrast is ``between
subject.'' This problem is less acute in higher order designs, but
cannot be entirely dismissed.
When the crossover design is used, it is therefore important to
avoid carryover. This is best done by selective and careful use of
the design on the basis of adequate knowledge of both the disease
area and the new medication. The disease under study should be
chronic and stable. The relevant effects of the medication should
develop fully within the treatment period. The washout periods
should be sufficiently long for complete reversibility of drug
effect. The fact that these conditions are likely to be met should
be established in advance of the trial by means of prior information
and data.
There are additional problems that need careful attention in
crossover trials. The most notable of these are the complications of
analysis and interpretation arising from the loss of subjects. Also,
the potential for carryover leads to difficulties in assigning
adverse events that occur in later treatment periods to the
appropriate treatment. These and other issues are described in ICH
E4. The crossover design should generally be restricted to
situations where losses of
[[Page 49589]]
subjects from the trial are expected to be small.
A common, and generally satisfactory, use of the 2 2 crossover
design is to demonstrate the bioequivalence of two formulations of
the same medication. In this particular application in healthy
volunteers, carryover effects on the relevant pharmacokinetic
variable are most unlikely to occur if the wash-out time between the
two periods is sufficiently long. However, it is still important to
check this assumption during analysis on the basis of the data
obtained, for example, by demonstrating that no drug is detectable
at the start of each period.
3.1.3 Factorial Designs
In a factorial design, two or more treatments are evaluated
simultaneously through the use of varying combinations of the
treatments. The simplest example is the 2 2 factorial design in
which subjects are randomly allocated to one of the four possible
combinations of two treatments, A and B. These are: A alone; B
alone; both A and B; neither A nor B. In many cases, this design is
used for the specific purpose of examining the interaction of A and
B. The statistical test of interaction may lack power to detect an
interaction if the sample size was calculated based on the test for
main effects. This consideration is important when this design is
used for examining the joint effects of A and B, in particular, if
the treatments are likely to be used together.
Another important use of the factorial design is to establish
the dose-response characteristics of the simultaneous use of
treatments C and D, especially when the efficacy of each monotherapy
has been established at some dose in prior trials. A number, m, of
doses of C is selected, usually including a zero dose (placebo), and
a similar number, n, of doses of D. The full design then consists of
m n treatment groups, each receiving a different combination of
doses of C and D. The resulting estimate of the response surface may
then be used to help identify an appropriate combination of doses of
C and D for clinical use (see ICH E4).
In some cases, the 2 2 design may be used to make efficient use
of clinical trial subjects by evaluating the efficacy of the two
treatments with the same number of subjects as would be required to
evaluate the efficacy of either one alone. This strategy has proved
to be particularly valuable for very large mortality trials. The
efficiency and validity of this approach depends upon the absence of
interaction between treatments A and B so that the effects of A and
B on the primary efficacy variables follow an additive model. Hence
the effect of A is virtually identical whether or not it is
additional to the effect of B. As for the crossover trial, evidence
that this condition is likely to be met should be established in
advance of the trial by means of prior information and data.
3.2 Multicenter Trials
Multicenter trials are carried out for two main reasons. First,
a multicenter trial is an accepted way of evaluating a new
medication more efficiently. Under some circumstances, it may
present the only practical means of accruing sufficient subjects to
satisfy the trial objective within a reasonable timeframe.
Multicenter trials of this nature may, in principle, be carried out
at any stage of clinical development. They may have several centers
with a large number of subjects per center or, in the case of a rare
disease, they may have a large number of centers with very few
subjects per center.
Second, a trial may be designed as a multicenter (and multi-
investigator) trial primarily to provide a better basis for the
subsequent generalization of its findings. This arises from the
possibility of recruiting the subjects from a wider population and
of administering the medication in a broader range of clinical
settings, thus presenting an experimental situation that is more
typical of future use. In this case, the involvement of a number of
investigators also gives the potential for a wider range of clinical
judgment concerning the value of the medication. Such a trial would
be a confirmatory trial in the later phases of drug development and
would be likely to involve a large number of investigators and
centers. It might sometimes be conducted in a number of different
countries to facilitate generalizability (see Glossary) even
further.
If a multicenter trial is to be meaningfully interpreted and
extrapolated, then the manner in which the protocol is implemented
should be clear and similar at all centers. Furthermore, the usual
sample size and power calculations depend upon the assumption that
the differences between the compared treatments in the centers are
unbiased estimates of the same quantity. It is important to design
the common protocol and to conduct the trial with this background in
mind. Procedures should be standardized as completely as possible.
Variation of evaluation criteria and schemes can be reduced by
investigator meetings, by the training of personnel in advance of
the trial, and by careful monitoring during the trial. Good design
should generally aim to achieve the same distribution of subjects to
treatments within each center and good management should maintain
this design objective. Trials that avoid excessive variation in the
numbers of subjects per center and trials that avoid a few very
small centers have advantages if it is later found necessary to take
into account the heterogeneity of the treatment effect from center
to center, because they reduce the differences between different
weighted estimates of the treatment effect. (This point does not
apply to trials in which all centers are very small and in which
center does not feature in the analysis.) Failure to take these
precautions, combined with doubts about the homogeneity of the
results, may, in severe cases, reduce the value of a multicenter
trial to such a degree that it cannot be regarded as giving
convincing evidence for the sponsor's claims.
In the simplest multicenter trial, each investigator will be
responsible for the subjects recruited at one hospital, so that
``center'' is identified uniquely by either investigator or
hospital. In many trials, however, the situation is more complex.
One investigator may recruit subjects from several hospitals; one
investigator may represent a team of clinicians (subinvestigators)
who all recruit subjects from their own clinics at one hospital or
at several associated hospitals. Whenever there is room for doubt
about the definition of center in a statistical model, the
statistical section of the protocol (see section 5.1) should clearly
define the term (e.g., by investigator, location or region) in the
context of the particular trial. In most instances, centers can be
satisfactorily defined through the investigators. (ICH E6 provides
relevant guidance in this respect.) In cases of doubt, the aim
should be to define centers to achieve homogeneity in the important
factors affecting the measurements of the primary variables and the
influence of the treatments. Any rules for combining centers in the
analysis should be justified and specified prospectively in the
protocol where possible, but in any case decisions concerning this
approach should always be taken blind to treatment, for example, at
the time of the blind review.
The statistical model to be adopted for the estimation and
testing of treatment effects should be described in the protocol.
The main treatment effect may be investigated first using a model
that allows for center differences, but does not include a term for
treatment-by-center interaction. If the treatment effect is
homogeneous across centers, the routine inclusion of interaction
terms in the model reduces the efficiency of the test for the main
effects. In the presence of true heterogeneity of treatment effects,
the interpretation of the main treatment effect is controversial.
In some trials, for example, some large mortality trials with
very few subjects per center, there may be no reason to expect the
centers to have any influence on the primary or secondary variables
because they are unlikely to represent influences of clinical
importance. In other trials, it may be recognized from the start
that the limited numbers of subjects per center will make it
impracticable to include the center effects in the statistical
model. In these cases, it is not considered appropriate to include a
term for center in the model, and it is not necessary to stratify
the randomization by center in this situation.
If positive treatment effects are found in a trial with
appreciable numbers of subjects per center, there should generally
be an exploration of the heterogeneity of treatment effects across
centers, as this may affect the generalizability of the conclusions.
Marked heterogeneity may be identified by graphical display of the
results of individual centers or by analytical methods, such as a
significance test of the treatment-by-center interaction. When using
such a statistical significance test, it is important to recognize
that this generally has low power in a trial designed to detect the
main effect of treatment.
If heterogeneity of treatment effects is found, this should be
interpreted with care, and vigorous attempts should be made to find
an explanation in terms of other features of trial management or
subject characteristics. Such an explanation will usually suggest
appropriate further analysis and interpretation. In the absence of
an explanation, heterogeneity of treatment effect, as evidenced, for
example, by marked quantitative interactions (see Glossary) implies
that alternative estimates of the
[[Page 49590]]
treatment effect, giving different weights to the centers, may be
needed to substantiate the robustness of the estimates of treatment
effect. It is even more important to understand the basis of any
heterogeneity characterized by marked qualitative interactions (see
Glossary), and failure to find an explanation may necessitate
further clinical trials before the treatment effect can be reliably
predicted.
Up to this point, the discussion of multicenter trials has been
based on the use of fixed effect models. Mixed models may also be
used to explore the heterogeneity of the treatment effect. These
models consider center and treatment-by-center effects to be random
and are especially relevant when the number of sites is large.
3.3 Type of Comparison
3.3.1 Trials to Show Superiority
Scientifically, efficacy is most convincingly established by
demonstrating superiority to placebo in a placebo-controlled trial,
by showing superiority to an active control treatment, or by
demonstrating a dose-response relationship. This type of trial is
referred to as a ``superiority'' trial (see Glossary). In this
guidance superiority trials are generally assumed, unless explicitly
stated otherwise.
For serious illnesses, when a therapeutic treatment that has
been shown to be efficacious by superiority trial(s) exists, a
placebo-controlled trial may be considered unethical. In that case
the scientifically sound use of an active treatment as a control
should be considered. The appropriateness of placebo control versus
active control should be considered on a trial-by-trial basis.
3.3.2 Trials to Show Equivalence or Noninferiority
In some cases, an investigational product is compared to a
reference treatment without the objective of showing superiority.
This type of trial is divided into two major categories according to
its objective; one is an ``equivalence'' trial (see Glossary) and
the other is a ``noninferiority'' trial (see Glossary).
Bioequivalence trials fall into the former category. In some
situations, clinical equivalence trials are also undertaken for
other regulatory reasons such as demonstrating the clinical
equivalence of a generic product to the marketed product when the
compound is not absorbed and therefore not present in the blood
stream.
Many active control trials are designed to show that the
efficacy of an investigational product is no worse than that of the
active comparator and, hence, fall into the latter category. Another
possibility is a trial in which multiple doses of the
investigational drug are compared with the recommended dose or
multiple doses of the standard drug. The purpose of this design is
simultaneously to show a dose-response relationship for the
investigational product and to compare the investigational product
with the active control.
Active control equivalence or noninferiority trials may also
incorporate a placebo, thus pursuing multiple goals in one trial.
For example, they may establish superiority to placebo and hence
validate the trial design and simultaneously evaluate the degree of
similarity of efficacy and safety to the active comparator. There
are well-known difficulties associated with the use of the active
control equivalence (or noninferiority) trials that do not
incorporate a placebo or do not use multiple doses of the new drug.
These relate to the implicit lack of any measure of internal
validity (in contrast to superiority trials), thus making external
validation necessary. The equivalence (or noninferiority) trial is
not conservative in nature, so that many flaws in the design or
conduct of the trial will tend to bias the results towards a
conclusion of equivalence. For these reasons, the design features of
such trials should receive special attention and their conduct needs
special care. For example, it is especially important to minimize
the incidence of violations of the entry criteria, noncompliance,
withdrawals, losses to follow-up, missing data, and other deviations
from the protocol, and also to minimize their impact on the
subsequent analyses.
Active comparators should be chosen with care. An example of a
suitable active comparator would be a widely used therapy whose
efficacy in the relevant indication has been clearly established and
quantified in well-designed and well-documented superiority trial(s)
and that can be reliably expected to exhibit similar efficacy in the
contemplated active control trial. To this end, the new trial should
have the same important design features (primary variables, the dose
of the active comparator, eligibility criteria, and so on) as the
previously conducted superiority trials in which the active
comparator clearly demonstrated clinically relevant efficacy, taking
into account advances in medical or statistical practice relevant to
the new trial.
It is vital that the protocol of a trial designed to demonstrate
equivalence or noninferiority contain a clear statement that this is
its explicit intention. An equivalence margin should be specified in
the protocol; this margin is the largest difference that can be
judged as being clinically acceptable and should be smaller than
differences observed in superiority trials of the active comparator.
For the active control equivalence trial, both the upper and the
lower equivalence margins are needed, while only the lower margin is
needed for the active control noninferiority trial. The choice of
equivalence margins should be justified clinically.
Statistical analysis is generally based on the use of confidence
intervals (see section 5.5). For equivalence trials, two-sided
confidence intervals should be used. Equivalence is inferred when
the entire confidence interval falls within the equivalence margins.
Operationally, this is equivalent to the method of using two
simultaneous one-sided tests to test the (composite) null hypothesis
that the treatment difference is outside the equivalence margins
versus the (composite) alternative hypothesis that the treatment
difference is within the margins. Because the two null hypotheses
are disjoint, the Type I error is appropriately controlled. For
noninferiority trials, a one-sided interval should be used. The
confidence interval approach has a one-sided hypothesis test
counterpart for testing the null hypothesis that the treatment
difference (investigational product minus control) is equal to the
lower equivalence margin versus the alternative that the treatment
difference is greater than the lower equivalence margin. The choice
of Type I error should be a consideration separate from the use of a
one-sided or two-sided procedure. Sample size calculations should be
based on these methods (see section 3.5).
Concluding equivalence or noninferiority based on observing a
nonsignificant test result of the null hypothesis that there is no
difference between the investigational product and the active
comparator is considered inappropriate.
There are also special issues in the choice of analysis sets.
Subjects who withdraw or drop out of the treatment group or the
comparator group will tend to have a lack of response; hence the
results of using the full analysis set (see Glossary) may be biased
toward demonstrating equivalence (see section 5.2.3).
3.3.3 Trials to Show Dose-Response Relationship
How response is related to the dose of a new investigational
product is a question to which answers may be obtained in all phases
of development and by a variety of approaches (see ICH E4). Dose-
response trials may serve a number of objectives, among which the
following are of particular importance: The confirmation of
efficacy; the investigation of the shape and location of the dose-
response curve; the estimation of an appropriate starting dose; the
identification of optimal strategies for individual dose
adjustments; the determination of a maximal dose beyond which
additional benefit would be unlikely to occur. These objectives
should be addressed using the data collected at a number of doses
under investigation, including a placebo (zero dose) wherever
appropriate. For this purpose, the application of procedures to
estimate the relationship between dose and response, including the
construction of confidence intervals and the use of graphical
methods, is as important as the use of statistical tests. The
hypothesis tests that are used may need to be tailored to the
natural ordering of doses or to particular questions regarding the
shape of the dose-response curve (e.g., monotonicity). The details
of the planned statistical procedures should be given in the
protocol.
3.4 Group Sequential Designs
Group sequential designs are used to facilitate the conduct of
interim analysis (see section 4.5 and Glossary). While group
sequential designs are not the only acceptable types of designs
permitting interim analysis, they are the most commonly applied
because it is more practicable to assess grouped subject outcomes at
periodic intervals during the trial than on a continuous basis as
data from each subject become available. The statistical methods
should be fully specified in advance of the availability of
information on treatment outcomes and subject treatment assignments
(i.e., blind breaking, see section 4.5). An
[[Page 49591]]
independent data monitoring committee (IDMC) (see Glossary) may be
used to review or to conduct the interim analysis of data arising
from a group sequential design (see section 4.6). While the design
has been most widely and successfully used in large, long-term
trials of mortality or major nonfatal endpoints, its use is growing
in other circumstances. In particular, it is recognized that safety
must be monitored in all trials; therefore, the need for formal
procedures to cover early stopping for safety reasons should always
be considered.
3.5 Sample Size
The number of subjects in a clinical trial should always be
large enough to provide a reliable answer to the questions
addressed. This number is usually determined by the primary
objective of the trial. If the sample size is determined on some
other basis, then this should be made clear and justified. For
example, a trial sized on the basis of safety questions or
requirements or important secondary objectives may need larger
numbers of subjects than a trial sized on the basis of the primary
efficacy question (see, for example, ICH E1A).
Using the usual method for determining the appropriate sample
size, the following items should be specified: A primary variable;
the test statistic; the null hypothesis; the alternative
(``working'') hypothesis at the chosen dose(s) (embodying
consideration of the treatment difference to be detected or rejected
at the dose and in the subject population selected); the probability
of erroneously rejecting the null hypothesis (the Type I error) and
the probability of erroneously failing to reject the null hypothesis
(the Type II error); as well as the approach to dealing with
treatment withdrawals and protocol violations. In some instances,
the event rate is of primary interest for evaluating power, and
assumptions should be made to extrapolate from the required number
of events to the eventual sample size for the trial.
The method by which the sample size is calculated should be
given in the protocol, together with the estimates of any quantities
used in the calculations (such as variances, mean values, response
rates, event rates, difference to be detected). The basis of these
estimates should also be given. It is important to investigate the
sensitivity of the sample size estimate to a variety of deviations
from these assumptions and this may be facilitated by providing a
range of sample sizes appropriate for a reasonable range of
deviations from assumptions. In confirmatory trials, assumptions
should normally be based on published data or on the results of
earlier trials. The treatment difference to be detected may be based
on a judgment concerning the minimal effect which has clinical
relevance in the management of patients or on a judgment concerning
the anticipated effect of the new treatment, where this is larger.
Conventionally, the probability of Type I error is set at 5 percent
or less or as dictated by any adjustments made necessary for
multiplicity considerations; the precise choice may be influenced by
the prior plausibility of the hypothesis under test and the desired
impact of the results. The probability of Type II error is
conventionally set at 10 percent to 20 percent. It is in the
sponsor's interest to keep this figure as low as feasible,
especially in the case of trials that are difficult or impossible to
repeat. Alternative values to the conventional levels of Type I and
Type II error may be acceptable or even preferable in some cases.
Sample size calculations should refer to the number of subjects
required for the primary analysis. If this is the ``full analysis
set,'' estimates of the effect size may need to be reduced compared
to the per protocol set (see Glossary). This is to allow for the
dilution of the treatment effect arising from the inclusion of data
from patients who have withdrawn from treatment or whose compliance
is poor. The assumptions about variability may also need to be
revised.
The sample size of an equivalence trial or a noninferiority
trial (see section 3.3.2) should normally be based on the objective
of obtaining a confidence interval for the treatment difference that
shows that the treatments differ at most by a clinically acceptable
difference. When the power of an equivalence trial is assessed at a
true difference of zero, then the sample size necessary to achieve
this power is underestimated if the true difference is not zero.
When the power of a noninferiority trial is assessed at a zero
difference, then the sample size needed to achieve that power will
be underestimated if the effect of the investigational product is
less than that of the active control. The choice of a ``clinically
acceptable'' difference needs justification with respect to its
meaning for future patients, and may be smaller than the
``clinically relevant'' difference referred to above in the context
of superiority trials designed to establish that a difference
exists.
The exact sample size in a group sequential trial cannot be
fixed in advance because it depends upon the play of chance in
combination with the chosen stopping guideline and the true
treatment difference. The design of the stopping guideline should
take into account the consequent distribution of the sample size,
usually embodied in the expected and maximum sample sizes.
When event rates are lower than anticipated or variability is
larger than expected, methods for sample size reestimation are
available without unblinding data or making treatment comparisons
(see section 4.4).
3.6 Data Capture and Processing
The collection of data and transfer of data from the
investigator to the sponsor can take place through a variety of
media, including paper case record forms, remote site monitoring
systems, medical computer systems, and electronic transfer. Whatever
data capture instrument is used, the form and content of the
information collected should be in full accordance with the protocol
and should be established in advance of the conduct of the clinical
trial. It should focus on the data necessary to implement the
planned analysis, including the context information (such as timing
assessments relative to dosing) necessary to confirm protocol
compliance or identify important protocol deviations. ``Missing
values'' should be distinguishable from the ``value zero'' or
``characteristic absent.''
The process of data capture, through to database finalization,
should be carried out in accordance with good clinical practice
(GCP) (see ICH E6, section 5). Specifically, timely and reliable
processes for recording data and rectifying errors and omissions are
necessary to ensure delivery of a quality database and the
achievement of the trial objectives through the implementation of
the planned analysis.
IV. Trial Conduct Considerations
4.1 Trial Monitoring and Interim Analysis
Careful conduct of a clinical trial according to the protocol
has a major impact on the credibility of the results (see ICH E6).
Careful monitoring can ensure that difficulties are noticed early
and their occurrence or recurrence minimized.
There are two distinct types of monitoring that generally
characterize confirmatory clinical trials sponsored by the
pharmaceutical industry. One type of monitoring concerns the
oversight of the quality of the trial, while the other type involves
breaking the blind to make treatment comparisons (i.e., interim
analysis). Both types of trial monitoring, in addition to entailing
different staff responsibilities, involve access to different types
of trial data and information, and thus different principles apply
for the control of potential statistical and operational bias.
For the purpose of overseeing the quality of the trial, the
checks involved in trial monitoring may include whether the protocol
is being followed, the acceptability of data being accrued, the
success of planned accrual targets, the appropriateness of the
design assumptions, success in keeping patients in the trials, and
so on (see sections 4.2 to 4.4). This type of monitoring does not
require access to information on comparative treatment effects nor
unblinding of data and, therefore, has no impact on Type I error.
The monitoring of a trial for this purpose is the responsibility of
the sponsor (see ICH E6) and can be carried out by the sponsor or an
independent group selected by the sponsor. The period for this type
of monitoring usually starts with the selection of the trial sites
and ends with the collection and cleaning of the last subject's
data.
The other type of trial monitoring (interim analysis) involves
the accruing of comparative treatment results. Interim analysis
requires unblinded (i.e., key breaking) access to treatment group
assignment (actual treatment assignment or identification of group
assignment) and comparative treatment group summary information.
Therefore, the protocol (or appropriate amendments prior to a first
analysis) should contain statistical plans for the interim analysis
to prevent certain types of bias. This is discussed in sections 4.5
and 4.6.
4.2 Changes in Inclusion and Exclusion Criteria
Inclusion and exclusion criteria should remain constant, as
specified in the protocol, throughout the period of subject
recruitment.
[[Page 49592]]
Changes may occasionally be appropriate, for example, in long-term
trials, where growing medical knowledge either from outside the
trial or from interim analyses may suggest a change of entry
criteria. Changes may also result from the discovery by monitoring
staff that regular violations of the entry criteria are occurring or
that seriously low recruitment rates are due to over-restrictive
criteria. Changes should be made without breaking the blind and
should always be described by a protocol amendment. This amendment
should cover any statistical consequences, such as sample size
adjustments arising from different event rates, or modifications to
the planned analysis, such as stratifying the analysis according to
modified inclusion/exclusion criteria.
4.3 Accrual Rates
In trials with a long time-scale for the accrual of subjects,
the rate of accrual should be monitored. If it falls appreciably
below the projected level, the reasons should be identified and
remedial actions taken to protect the power of the trial and
alleviate concerns about selective entry and other aspects of
quality. In a multicenter trial, these considerations apply to the
individual centers.
4.4 Sample Size Adjustment
In long-term trials there will usually be an opportunity to
check the assumptions which underlie the original design and sample
size calculations. This may be particularly important if the trial
specifications have been made on preliminary and/or uncertain
information. An interim check conducted on the blinded data may
reveal that overall response variances, event rates or survival
experience are not as anticipated. A revised sample size may then be
calculated using suitably modified assumptions, and should be
justified and documented in a protocol amendment and in the clinical
study report. The steps taken to preserve blindness and the
consequences, if any, for the Type I error and the width of
confidence intervals should be explained. The potential need for re-
estimation of the sample size should be envisaged in the protocol
whenever possible (see section 3.5).
4.5 Interim Analysis and Early Stopping
An interim analysis is any analysis intended to compare
treatment arms with respect to efficacy or safety at any time prior
to formal completion of a trial. Because the number, methods, and
consequences of these comparisons affect the interpretation of the
trial, all interim analyses should be carefully planned in advance
and described in the protocol. Special circumstances may dictate the
need for an interim analysis that was not defined at the start of a
trial. In these cases, a protocol amendment describing the interim
analysis should be completed prior to unblinded access to treatment
comparison data. When an interim analysis is planned with the
intention of deciding whether or not to terminate a trial, this is
usually accomplished by the use of a group sequential design that
employs statistical monitoring schemes as guidelines (see section
3.4). The goal of such an interim analysis is to stop the trial
early if the superiority of the treatment under study is clearly
established, if the demonstration of a relevant treatment difference
has become unlikely, or if unacceptable adverse effects are
apparent. Generally, boundaries for monitoring efficacy require more
evidence to terminate a trial early (i.e., they are more
conservative) than boundaries for monitoring safety. When the trial
design and monitoring objective involve multiple endpoints, then
this aspect of multiplicity may also need to be taken into account.
The protocol should describe the schedule of interim analyses
or, at least, the considerations that will govern its generation,
for example, if flexible alpha spending function approaches are to
be employed. Further details may be given in a protocol amendment
before the time of the first interim analysis. The stopping
guidelines and their properties should be clearly described in the
protocol or amendments. The potential effects of early stopping on
the analysis of other important variables should also be considered.
This material should be written or approved by the data monitoring
committee (see section 4.6), when the trial has one. Deviations from
the planned procedure always bear the potential of invalidating the
trial results. If it becomes necessary to make changes to the trial,
any consequent changes to the statistical procedures should be
specified in an amendment to the protocol at the earliest
opportunity, especially discussing the impact on any analysis and
inferences that such changes may cause. The procedures selected
should always ensure that the overall probability of Type I error is
controlled.
The execution of an interim analysis should be a completely
confidential process because unblinded data and results are
potentially involved. All staff involved in the conduct of the trial
should remain blind to the results of such analyses, because of the
possibility that their attitudes to the trial will be modified and
cause changes in the characteristics of patients to be recruited or
biases in treatment comparisons. This principle may be applied to
all investigator staff and to staff employed by the sponsor except
for those who are directly involved in the execution of the interim
analysis. Investigators should be informed only about the decision
to continue or to discontinue the trial, or to implement
modifications to trial procedures.
Most clinical trials intended to support the efficacy and safety
of an investigational product should proceed to full completion of
planned sample size accrual; trials should be stopped early only for
ethical reasons or if the power is no longer acceptable. However, it
is recognized that drug development plans involve the need for
sponsor access to comparative treatment data for a variety of
reasons, such as planning other trials. It is also recognized that
only a subset of trials will involve the study of serious life-
threatening outcomes or mortality which may need sequential
monitoring of accruing comparative treatment effects for ethical
reasons. In either of these situations, plans for interim
statistical analysis should be in place in the protocol or in
protocol amendments prior to the unblinded access to comparative
treatment data in order to deal with the potential statistical and
operational bias that may be introduced.
For many clinical trials of investigational products, especially
those that have major public health significance, the responsibility
for monitoring comparisons of efficacy and/or safety outcomes should
be assigned to an external independent group, often called an
independent data monitoring committee (IDMC), a data and safety
monitoring board, or a data monitoring committee, whose
responsibilities should be clearly described.
When a sponsor assumes the role of monitoring efficacy or safety
comparisons and therefore has access to unblinded comparative
information, particular care should be taken to protect the
integrity of the trial and to manage and limit appropriately the
sharing of information. The sponsor should ensure and document that
the internal monitoring committee has complied with written standard
operating procedures and that minutes of decisionmaking meetings,
including records of interim results, are maintained.
Any interim analysis that is not planned appropriately (with or
without the consequences of stopping the trial early) may flaw the
results of a trial and possibly weaken confidence in the conclusions
drawn. Therefore, such analyses should be avoided. If unplanned
interim analysis is conducted, the clinical study report should
explain why it was necessary and the degree to which blindness had
to be broken, and provide an assessment of the potential magnitude
of bias introduced and the impact on the interpretation of the
results.
4.6 Role of Independent Data Monitoring Committee (IDMC)(see
sections 1.25 and 5.5.2 of ICH E6)
An IDMC may be established by the sponsor to assess at intervals
the progress of a clinical trial, safety data, and critical efficacy
variables and recommend to the sponsor whether to continue, modify
or terminate a trial. The IDMC should have written operating
procedures and maintain records of all its meetings, including
interim results; these should be available for review when the trial
is complete. The independence of the IDMC is intended to control the
sharing of important comparative information and to protect the
integrity of the clinical trial from adverse impact resulting from
access to trial information. The IDMC is a separate entity from an
institutional review board (IRB) or an independent ethics committee
(IEC), and its composition should include clinical trial scientists
knowledgeable in the appropriate disciplines, including statistics.
When there are sponsor representatives on the IDMC, their role
should be clearly defined in the operating procedures of the
committee (for example, covering whether or not they can vote on key
issues). Since these sponsor staff would have access to unblinded
information, the procedures should also address the control of
dissemination of interim trial results within the sponsor
organization.
[[Page 49593]]
V. Data Analysis Considerations
5.1 Prespecification of the Analysis
When designing a clinical trial, the principal features of the
eventual statistical analysis of the data should be described in the
statistical section of the protocol. This section should include all
the principal features of the proposed confirmatory analysis of the
primary variable(s) and the way in which anticipated analysis
problems will be handled. In the case of exploratory trials, this
section could describe more general principles and directions.
The statistical analysis plan (see Glossary) may be written as a
separate document to be completed after finalizing the protocol. In
this document, a more technical and detailed elaboration of the
principal features stated in the protocol may be included (see
section 7.1). The plan may include detailed procedures for executing
the statistical analysis of the primary and secondary variables and
other data. The plan should be reviewed and possibly updated as a
result of the blind review of the data (see section 7.1 for
definition) and should be finalized before breaking the blind.
Formal records should be kept of when the statistical analysis plan
was finalized as well as when the blind was subsequently broken.
If the blind review suggests changes to the principal features
stated in the protocol, these should be documented in a protocol
amendment. Otherwise, it should suffice to update the statistical
analysis plan with the considerations suggested from the blind
review. Only results from analyses envisaged in the protocol
(including amendments) can be regarded as confirmatory.
In the statistical section of the clinical study report, the
statistical methodology should be clearly described including when
in the clinical trial process methodology decisions were made (see
ICH E3).
5.2 Analysis Sets
The set of subjects whose data are to be included in the main
analyses should be defined in the statistical section of the
protocol. In addition, documentation for all subjects for whom trial
procedures (e.g., run-in period) were initiated may be useful. The
content of this subject documentation depends on detailed features
of the particular trial, but at least demographic and baseline data
on disease status should be collected whenever possible.
If all subjects randomized into a clinical trial satisfied all
entry criteria, followed all trial procedures perfectly with no
losses to followup, and provided complete data records, then the set
of subjects to be included in the analysis would be self-evident.
The design and conduct of a trial should aim to approach this ideal
as closely as possible, but, in practice, it is doubtful if it can
ever be fully achieved. Hence, the statistical section of the
protocol should address anticipated problems prospectively in terms
of how these affect the subjects and data to be analyzed. The
protocol should also specify procedures aimed at minimizing any
anticipated irregularities in study conduct that might impair a
satisfactory analysis, including various types of protocol
violations, withdrawals and missing values. The protocol should
consider ways both to reduce the frequency of such problems and to
handle the problems that do occur in the analysis of data. Possible
amendments to the way in which the analysis will deal with protocol
violations should be identified during the blind review. It is
desirable to identify any important protocol violation with respect
to the time when it occurred, its cause, and its influence on the
trial result. The frequency and type of protocol violations, missing
values, and other problems should be documented in the clinical
study report and their potential influence on the trial results
should be described (see ICH E3).
Decisions concerning the analysis set should be guided by the
following principles: (1) To minimize bias and (2) to avoid
inflation of Type I error.
5.2.1 Full Analysis Set
The intention-to-treat (see Glossary) principle implies that the
primary analysis should include all randomized subjects. Compliance
with this principle would necessitate complete followup of all
randomized subjects for study outcomes. In practice, this ideal may
be difficult to achieve, for reasons to be described. In this
document, the term ``full analysis set'' is used to describe the
analysis set which is as complete as possible and as close as
possible to the intention-to-treat ideal of including all randomized
subjects. Preservation of the initial randomization in analysis is
important in preventing bias and in providing a secure foundation
for statistical tests. In many clinical trials, the use of the full
analysis set provides a conservative strategy. Under many
circumstances, it may also provide estimates of treatment effects
that are more likely to mirror those observed in subsequent
practice.
There are a limited number of circumstances that might lead to
excluding randomized subjects from the full analysis set, including
the failure to satisfy major entry criteria (eligibility
violations), the failure to take at least one dose of trial
medication, and the lack of any data post randomization. Such
exclusions should always be justified. Subjects who fail to satisfy
an entry criterion may be excluded from the analysis without the
possibility of introducing bias only under the following
circumstances:
(i) The entry criterion was measured prior to randomization.
(ii) The detection of the relevant eligibility violations can be
made completely objectively.
(iii) All subjects receive equal scrutiny for eligibility
violations. (This may be difficult to ensure in an open-label study,
or even in a double-blind study if the data are unblinded prior to
this scrutiny, emphasizing the importance of the blind review.)
(iv) All detected violations of the particular entry criterion
are excluded.
In some situations, it may be reasonable to eliminate from the
set of all randomized subjects any subject who took no trial
medication. The intention-to-treat principle would be preserved
despite the exclusion of these patients provided, for example, that
the decision of whether or not to begin treatment could not be
influenced by knowledge of the assigned treatment. In other
situations it may be necessary to eliminate from the set of all
randomized subjects any subject without data post randomization. No
analysis should be considered complete unless the potential biases
arising from these specific exclusions, or any others, are
addressed.
When the full analysis set of subjects is used, violations of
the protocol that occur after randomization may have an impact on
the data and conclusions, particularly if their occurrence is
related to treatment assignment. In most respects, it is appropriate
to include the data from such subjects in the analysis, consistent
with the intention-to-treat principle. Special problems arise in
connection with subjects withdrawn from treatment after receiving
one or more doses who provide no data after this point, and subjects
otherwise lost to followup, because failure to include these
subjects in the full analysis set may seriously undermine the
approach. Measurements of primary variables made at the time of the
loss to follow-up of a subject for any reason, or subsequently
collected in accordance with the intended schedule of assessments in
the protocol, are valuable in this context; subsequent collection is
especially important in studies where the primary variable is
mortality or serious morbidity. The intention to collect data in
this way should be described in the protocol. Imputation techniques,
ranging from the carrying forward of the last observation to the use
of complex mathematical models, may also be used in an attempt to
compensate for missing data. Other methods employed to ensure the
availability of measurements of primary variables for every subject
in the full analysis set may require some assumptions about the
subjects' outcomes or a simpler choice of outcome (e.g., success/
failure). The use of any of these strategies should be described and
justified in the statistical section of the protocol, and the
assumptions underlying any mathematical models employed should be
clearly explained. It is also important to demonstrate the
robustness of the corresponding results of analysis, especially when
the strategy in question could itself lead to biased estimates of
treatment effects.
Because of the unpredictability of some problems, it may
sometimes be preferable to defer detailed consideration of the
manner of dealing with irregularities until the blind review of the
data at the end of the trial, and, if so, this should be stated in
the protocol.
5.2.2 Per Protocol Set
The ``per protocol'' set of subjects, sometimes described as the
``valid cases,'' the ``efficacy'' sample, or the ``evaluable
subjects'' sample, defines a subset of the subjects in the full
analysis set who are more compliant with the protocol and is
characterized by criteria such as the following:
(i) The completion of a certain prespecified minimal exposure to
the treatment regimen;
(ii) The availability of measurements of the primary
variable(s);
(iii) The absence of any major protocol violations, including
the violation of entry criteria.
[[Page 49594]]
The precise reasons for excluding subjects from the per protocol
set should be fully defined and documented before breaking the blind
in a manner appropriate to the circumstances of the specific trial.
The use of the per protocol set may maximize the opportunity for
a new treatment to show additional efficacy in the analysis, and
most closely reflects the scientific model underlying the protocol.
However, the corresponding test of the hypothesis and estimate of
the treatment effect may or may not be conservative, depending on
the trial. The bias, which may be severe, arises from the fact that
adherence to the study protocol may be related to treatment and
outcome.
The problems that lead to the exclusion of subjects to create
the per protocol set, and other protocol violations, should be fully
identified and summarized. Relevant protocol violations may include
errors in treatment assignment, the use of excluded medication, poor
compliance, loss to followup, and missing data. It is good practice
to assess the pattern of such problems among the treatment groups
with respect to frequency and time to occurrence.
5.2.3 Roles of the Different Analysis Sets
In general, it is advantageous to demonstrate a lack of
sensitivity of the principal trial results to alternative choices of
the set of subjects analyzed. In confirmatory trials, it is usually
appropriate to plan to conduct both an analysis of the full analysis
set and a per protocol analysis, so that any differences between
them can be the subject of explicit discussion and interpretation.
In some cases, it may be desirable to plan further exploration of
the sensitivity of conclusions to the choice of the set of subjects
analyzed. When the full analysis set and the per protocol set lead
to essentially the same conclusions, confidence in the trial results
is increased, bearing in mind, however, that the need to exclude a
substantial proportion of subjects from the per protocol analysis
throws some doubt on the overall validity of the trial.
The full analysis set and the per protocol set play different
roles in superiority trials (which seek to show the investigational
product to be superior) and in equivalence or noninferiority trials
(which seek to show the investigational product to be comparable,
see section 3.3.2). In superiority trials, the full analysis set is
used in the primary analysis (apart from exceptional circumstances)
because it tends to avoid over-optimistic estimates of efficacy
resulting from a per protocol analysis. This is because the
noncompliers included in the full analysis set will generally
diminish the estimated treatment effect. However, in an equivalence
or noninferiority trial, use of the full analysis set is generally
not conservative and its role should be considered very carefully.
5.3 Missing Values and Outliers
Missing values represent a potential source of bias in a
clinical trial. Hence, every effort should be undertaken to fulfill
all the requirements of the protocol concerning the collection and
management of data. In reality, however, there will almost always be
some missing data. A trial may be regarded as valid, nonetheless,
provided the methods of dealing with missing values are sensible,
particularly if those methods are predefined in the protocol.
Definition of methods may be refined by updating this aspect in the
statistical analysis plan during the blind review. Unfortunately, no
universally applicable methods of handling missing values can be
recommended. An investigation should be made concerning the
sensitivity of the results of analysis to the method of handling
missing values, especially if the number of missing values is
substantial.
A similar approach should be adopted to exploring the influence
of outliers, the statistical definition of which is, to some extent,
arbitrary. Clear identification of a particular value as an outlier
is most convincing when justified medically as well as
statistically, and the medical context will then often define the
appropriate action. Any outlier procedure set out in the protocol or
the statistical analysis plan should be such as not to favor any
treatment group a priori. Once again, this aspect of the analysis
can be usefully updated during blind review. If no procedure for
dealing with outliers was foreseen in the trial protocol, one
analysis with the actual values and at least one other analysis
eliminating or reducing the outlier effect should be performed and
differences between their results discussed.
5.4 Data Transformation
The decision to transform key variables prior to analysis is
best made during the design of the trial on the basis of similar
data from earlier clinical trials. Transformations (e.g., square
root, logarithm) should be specified in the protocol and a rationale
provided, especially for the primary variable(s). The general
principles guiding the use of transformations to ensure that the
assumptions underlying the statistical methods are met are to be
found in standard texts; conventions for particular variables have
been developed in a number of specific clinical areas. The decision
on whether and how to transform a variable should be influenced by
the preference for a scale that facilitates clinical interpretation.
Similar considerations apply to other derived variables, such as
the use of change from baseline, percentage change from baseline,
the ``area under the curve'' of repeated measures, or the ratio of
two different variables. Subsequent clinical interpretation should
be carefully considered, and the derivation should be justified in
the protocol. Closely related points are made in section 2.2.2.
5.5 Estimation, Confidence Intervals, and Hypothesis Testing
The statistical section of the protocol should specify the
hypotheses that are to be tested and/or the treatment effects that
are to be estimated in order to satisfy the primary objectives of
the trial. The statistical methods to be used to accomplish these
tasks should be described for the primary (and preferably the
secondary) variables, and the underlying statistical model should be
made clear. Estimates of treatment effects should be accompanied by
confidence intervals, whenever possible, and the way in which these
will be calculated should be identified. A description should be
given of any intentions to use baseline data to improve precision or
to adjust estimates for potential baseline differences, for example,
by means of analysis of covariance.
It is important to clarify whether one- or two-sided tests of
statistical significance will be used and, in particular, to justify
prospectively the use of one-sided tests. If hypothesis tests are
not considered appropriate, then the alternative process for
arriving at statistical conclusions should be given. The issue of
one-sided or two-sided approaches to inference is controversial, and
a diversity of views can be found in the statistical literature. The
approach of setting Type I errors for one-sided tests at half the
conventional Type I error used in two-sided tests is preferable in
regulatory settings. This promotes consistency with the two-sided
confidence intervals that are generally appropriate for estimating
the possible size of the difference between two treatments.
The particular statistical model chosen should reflect the
current state of medical and statistical knowledge about the
variables to be analyzed as well as the statistical design of the
trial. All effects to be fitted in the analysis (for example, in
analysis of variance models) should be fully specified, and the
manner, if any, in which this set of effects might be modified in
response to preliminary results should be explained. The same
considerations apply to the set of covariates fitted in an analysis
of covariance. (See also section 5.7.) In the choice of statistical
methods, due attention should be paid to the statistical
distribution of both primary and secondary variables. When making
this choice (for example between parametric and nonparametric
methods), it is important to bear in mind the need to provide
statistical estimates of the size of treatment effects together with
confidence intervals (in addition to significance tests).
The primary analysis of the primary variable should be clearly
distinguished from supporting analyses of the primary or secondary
variables. Within the statistical section of the protocol or the
statistical analysis plan there should also be an outline of the way
in which data other than the primary and secondary variables will be
summarized and reported. This should include a reference to any
approaches adopted for the purpose of achieving consistency of
analysis across a range of trials, for example, for safety data.
Modeling approaches that incorporate information on known
pharmacological parameters, the extent of protocol compliance for
individual subjects, or other biologically based data may provide
valuable insights into actual or potential efficacy, especially with
regard to estimation of treatment effects. The assumptions
underlying such models should always be clearly identified, and the
limitations of any conclusions should be carefully described.
5.6 Adjustment of Significance and Confidence Levels
When multiplicity is present, the usual frequentist approach to
the analysis of
[[Page 49595]]
clinical trial data may necessitate an adjustment to the Type I
error. Multiplicity may arise, for example, from multiple primary
variables (see section 2.2.2), multiple comparisons of treatments,
repeated evaluation over time, and/or interim analyses (see section
4.5). Methods to avoid or reduce multiplicity are sometimes
preferable when available, such as the identification of the key
primary variable (multiple variables), the choice of a critical
treatment contrast (multiple comparisons), and the use of a summary
measure such as ``area under the curve'' (repeated measures). In
confirmatory analyses, any aspects of multiplicity that remain after
steps of this kind have been taken should be identified in the
protocol; adjustment should always be considered and the details of
any adjustment procedure or an explanation of why adjustment is not
thought to be necessary should be set out in the analysis plan.
5.7 Subgroups, Interactions, and Covariates
The primary variable(s) is often systematically related to other
influences apart from treatment. For example, there may be
relationships to covariates such as age and sex, or there may be
differences between specific subgroups of subjects, such as those
treated at the different centers of a multicenter trial. In some
instances, an adjustment for the influence of covariates or for
subgroup effects is an integral part of the planned analysis and
hence should be set out in the protocol. Pretrial deliberations
should identify those covariates and factors expected to have an
important influence on the primary variable(s), and should consider
how to account for these in the analysis to improve precision and to
compensate for any lack of balance between treatment groups. If one
or more factors are used to stratify the design, it is appropriate
to account for those factors in the analysis. When the potential
value of an adjustment is in doubt, it is often advisable to
nominate the unadjusted analysis as the one for primary attention,
the adjusted analysis being supportive. Special attention should be
paid to center effects and to the role of baseline measurements of
the primary variable. It is not advisable to adjust the main
analyses for covariates measured after randomization because they
may be affected by the treatments.
The treatment effect itself may also vary with subgroup or
covariate--for example, the effect may decrease with age or may be
larger in a particular diagnostic category of subjects. In some
cases such interactions are anticipated or are of particular prior
interest (e.g., geriatrics); hence a subgroup analysis or a
statistical model including interactions is part of the planned
confirmatory analysis. In most cases, however, subgroup or
interaction analyses are exploratory and should be clearly
identified as such; they should explore the uniformity of any
treatment effects found overall. In general, such analyses should
proceed first through the addition of interaction terms to the
statistical model in question, complemented by additional
exploratory analysis within relevant subgroups of subjects, or
within strata defined by the covariates. When exploratory, these
analyses should be interpreted cautiously. Any conclusion of
treatment efficacy (or lack thereof) or safety based solely on
exploratory subgroup analyses is unlikely to be accepted.
5.8 Integrity of Data and Computer Software Validity
The credibility of the numerical results of the analysis depends
on the quality and validity of the methods and software (both
internally and externally written) used both for data management
(data entry, storage, verification, correction, and retrieval) and
for processing the data statistically. Data management activities
should therefore be based on thorough and effective standard
operating procedures. The computer software used for data management
and statistical analysis should be reliable, and documentation of
appropriate software testing procedures should be available.
VI. Evaluation of Safety and Tolerability
6.1 Scope of Evaluation
In all clinical trials, evaluation of safety and tolerability
(see Glossary) constitutes an important element. In early phases
this evaluation is mostly of an exploratory nature and is only
sensitive to frank expressions of toxicity, whereas in later phases
the establishment of the safety and tolerability profile of a drug
can be characterized more fully in larger samples of subjects. Later
phase controlled trials represent an important means of exploring,
in an unbiased manner, any new potential adverse effects, even if
such trials generally lack power in this respect.
Certain trials may be designed with the purpose of making
specific claims about superiority or equivalence with regard to
safety and tolerability compared to another drug or to another dose
of the investigational drug. Such specific claims should be
supported by relevant evidence from confirmatory trials, similar to
that necessary for corresponding efficacy claims.
6.2 Choice of Variables and Data Collection
In any clinical trial, the methods and measurements chosen to
evaluate the safety and tolerability of a drug will depend on a
number of factors, including knowledge of the adverse effects of
closely related drugs, information from nonclinical and earlier
clinical trials and possible consequences of the pharmacodynamic/
pharmacokinetic properties of the particular drug, the mode of
administration, the type of subjects to be studied, and the duration
of the trial. Laboratory tests concerning clinical chemistry and
hematology, vital signs, and clinical adverse events (diseases,
signs, and symptoms) usually form the main body of the safety and
tolerability data. The occurrence of serious adverse events and
treatment discontinuations due to adverse events are particularly
important to register (see ICH E2A and ICH E3).
Furthermore, it is recommended that a consistent methodology be
used for the data collection and evaluation throughout a clinical
trial program to facilitate the combining of data from different
trials. The use of a common adverse event dictionary is particularly
important. This dictionary has a structure that makes it possible to
summarize the adverse event data on three different levels: System-
organ class, preferred term, or included term (see Glossary). The
preferred term is the level on which adverse events usually are
summarized, and preferred terms belonging to the same system-organ
class could then be brought together in the descriptive presentation
of data (see ICH M1).
6.3 Set of Subjects to be Evaluated and Presentation of Data
For the overall safety and tolerability assessment, the set of
subjects to be summarized is usually defined as those subjects who
received at least one dose of the investigational drug. Safety and
tolerability variables should be collected as comprehensively as
possible from these subjects, including type of adverse event,
severity, onset, and duration (see ICH E2B). Additional safety and
tolerability evaluations may be needed in specific subpopulations,
such as females, the elderly (see ICH E7), the severely ill, or
those who have a common concomitant treatment. These evaluations may
need to address more specific issues (see ICH E3).
All safety and tolerability variables will need attention during
evaluation, and the broad approach should be indicated in the
protocol. All adverse events should be reported, whether or not they
are considered to be related to treatment. All available data in the
study population should be accounted for in the evaluation.
Definitions of measurement units and reference ranges of laboratory
variables should be made with care; if different units or different
reference ranges appear in the same trial (e.g., if more than one
laboratory is involved), then measurements should be appropriately
standardized to allow a unified evaluation. Use of a toxicity
grading scale should be prespecified and justified.
The incidence of a certain adverse event is usually expressed in
the form of a proportion relating number of subjects experiencing
events to number of subjects at risk. However, it is not always
self-evident how to assess incidence. For example, depending on the
situation, the number of exposed subjects or the extent of exposure
(in person-years) could be considered for the denominator. Whether
the purpose of the calculation is to estimate a risk or to make a
comparison between treatment groups, it is important that the
definition is given in the protocol. This is especially important if
long-term treatment is planned and a substantial proportion of
treatment withdrawals or deaths are expected. For such situations,
survival analysis methods should be considered and cumulative
adverse event rates calculated in order to avoid the risk of
underestimation.
In situations when there is a substantial background noise of
signs and symptoms (e.g., in psychiatric trials), one should
consider ways for accounting for this in the estimation of risk for
different adverse events. One such method is to make use of the
``treatment emergent'' (see Glossary) concept in which adverse
events are recorded only if they emerge or worsen relative to
pretreatment baseline.
[[Page 49596]]
Other methods to reduce the effect of the background noise may
also be appropriate, such as ignoring adverse events of mild
severity or requiring that an event should have been observed at
repeated visits to qualify for inclusion in the numerator. Such
methods should be explained and justified in the protocol.
6.4 Statistical Evaluation
The investigation of safety and tolerability is a
multidimensional problem. Although some specific adverse effects can
usually be anticipated and specifically monitored for any drug, the
range of possible adverse effects is very large, and new and
unforeseeable effects are always possible. Further, an adverse event
experienced after a protocol violation, such as use of an excluded
medication, may introduce a bias. This background underlies the
statistical difficulties associated with the analytical evaluation
of safety and tolerability of drugs, and means that conclusive
information from confirmatory clinical trials is the exception
rather than the rule.
In most trials, the safety and tolerability implications are
best addressed by applying descriptive statistical methods to the
data, supplemented by calculation of confidence intervals wherever
this aids interpretation. It is also valuable to make use of
graphical presentations in which patterns of adverse events are
displayed both within treatment groups and within subjects.
The calculation of p-values is sometimes useful, either as an
aid to evaluating a specific difference of interest or as a
``flagging'' device applied to a large number of safety and
tolerability variables to highlight differences worthy of further
attention. This is particularly useful for laboratory data, which
otherwise can be difficult to summarize appropriately. It is
recommended that laboratory data be subjected to both a quantitative
analysis, e.g., evaluation of treatment means, and a qualitative
analysis where counting of numbers above or below certain thresholds
are calculated.
If hypothesis tests are used, statistical adjustments for
multiplicity to quantify the Type I error are appropriate, but the
Type II error is usually of more concern. Care should be taken when
interpreting putative statistically significant findings when there
is no multiplicity adjustment.
In the majority of trials, investigators are seeking to
establish that there are no clinically unacceptable differences in
safety and tolerability compared with either a comparator drug or a
placebo. As is the case for noninferiority or equivalence evaluation
of efficacy, the use of confidence intervals is preferred to
hypothesis testing in this situation. In this way, the considerable
imprecision often arising from low frequencies of occurrence is
clearly demonstrated.
6.5 Integrated Summary
The safety and tolerability properties of a drug are commonly
summarized across trials continuously during an investigational
product's development and, in particular, at the time of a marketing
application. The usefulness of this summary, however, is dependent
on adequate and well-controlled individual trials with high data
quality.
The overall usefulness of a drug is always a question of balance
between risk and benefit. In a single trial, such a perspective
could also be considered even if the assessment of risk/benefit
usually is performed in the summary of the entire clinical trial
program. (See section 7.2.2)
For more details on the reporting of safety and tolerability,
see section 12 of ICH E3.
VII. Reporting
7.1 Evaluation and Reporting
As stated in the introduction, the structure and content of
clinical study reports is the subject of ICH E3. That ICH guidance
fully covers the reporting of statistical work, appropriately
integrated with clinical and other material. The current section is
therefore relatively brief.
During the planning phase of a trial, the principal features of
the analysis should have been specified in the protocol as described
in section 5. When the conduct of the trial is over and the data are
assembled and available for preliminary inspection, it is valuable
to carry out the blind review of the planned analysis also described
in section 5. This pre-analysis review, blinded to treatment, should
cover, for example, decisions concerning the exclusion of subjects
or data from the analysis sets, the checking of possible
transformations and definitions of outliers, the addition to the
model of important covariates identified in other recent research,
and the reconsideration of the use of parametric or nonparametric
methods. Decisions made at this time should be described in the
report and should be distinguished from those made after the
statistician has had access to the treatment codes, as blind
decisions will generally introduce less potential for bias.
Statisticians or other staff involved in unblinded interim analysis
should not participate in the blind review or in making
modifications to the statistical analysis plan. When the blinding is
compromised by the possibility that treatment-induced effects may be
apparent in the data, special care will be needed for the blind
review.
Many of the more detailed aspects of presentation and tabulation
should be finalized at or about the time of the blind review so
that, by the time of the actual analysis, full plans exist for all
its aspects including subject selection, data selection and
modification, data summary and tabulation, estimation, and
hypothesis testing. Once data validation is complete, the analysis
should proceed according to the predefined plans; the more these
plans are adhered to, the greater the credibility of the results.
Particular attention should be paid to any differences between the
planned analysis and the actual analysis as described in the
protocol, the protocol amendments, or the updated statistical
analysis plan based on a blind review of data. A careful explanation
should be provided for deviations from the planned analysis.
All subjects who entered the trial should be accounted for in
the report, whether or not they are included in the analysis. All
reasons for exclusion from analysis should be documented; for any
subject included in the full analysis set but not in the per
protocol set, the reasons for exclusion from the latter should also
be documented. Similarly, for all subjects included in an analysis
set, the measurements of all important variables should be accounted
for at all relevant time-points.
The effect of all losses of subjects or data, withdrawals from
treatment, and major protocol violations on the main analyses of the
primary variable(s) should be considered carefully. Subjects lost to
followup, withdrawn from treatment, or with a severe protocol
violation should be identified and a descriptive analysis of them
provided, including the reasons for their loss and its relationship
to treatment and outcome.
Descriptive statistics form an indispensable part of reports.
Suitable tables and/or graphical presentations should illustrate
clearly the important features of the primary and secondary
variables and of key prognostic and demographic variables. The
results of the main analyses relating to the objectives of the trial
should be the subject of particularly careful descriptive
presentation. When reporting the results of significance tests,
precise p-values (e.g., ``p=0.034'') should be reported rather than
making exclusive reference to critical values.
Although the primary goal of the analysis of a clinical trial
should be to answer the questions posed by its main objectives, new
questions based on the observed data may well emerge during the
unblinded analysis. Additional and perhaps complex statistical
analysis may be the consequence. This additional work should be
strictly distinguished in the report from work which was planned in
the protocol.
The play of chance may lead to unforeseen imbalances between the
treatment groups in terms of baseline measurements not predefined as
covariates in the planned analysis but having some prognostic
importance nevertheless. This is best dealt with by showing that an
additional analysis which accounts for these imbalances reaches
essentially the same conclusions as the planned analysis. If this is
not the case, the effect of the imbalances on the conclusions should
be discussed.
In general, sparing use should be made of unplanned analyses.
Such analyses are often carried out when it is thought that the
treatment effect may vary according to some other factor or factors.
An attempt may then be made to identify subgroups of subjects for
whom the effect is particularly beneficial. The potential dangers of
over-interpretation of unplanned subgroup analyses are well known
(see also section 5.7) and should be carefully avoided. Although
similar problems of interpretation arise if a treatment appears to
have no benefit or an adverse effect in a subgroup of subjects, such
possibilities should be properly assessed and should therefore be
reported.
Finally, statistical judgement should be brought to bear on the
analysis, interpretation and presentation of the results of a
clinical trial. To this end, the trial statistician should be a
member of the team responsible for the clinical study report and
should approve the clinical report.
[[Page 49597]]
7.2 Summarizing the Clinical Database
An overall summary and synthesis of the evidence on safety and
efficacy from all the reported clinical trials is required for a
marketing application (expert report in EU, integrated summary
reports in the United States, gaiyou in Japan). This may be
accompanied, when appropriate, by a statistical combination of
results.
Within the summary a number of areas of specific statistical
interest arise: Describing the demography and clinical features of
the population treated during the course of the clinical trial
program; addressing the key questions of efficacy by considering the
results of the relevant (usually controlled) trials and highlighting
the degree to which they reinforce or contradict each other;
summarizing the safety information available from the combined
database of all the trials whose results contribute to the marketing
application; and identifying potential safety issues. During the
design of a clinical program, careful attention should be paid to
the uniform definition and collection of measurements which will
facilitate subsequent interpretation of the series of trials,
particularly if they are likely to be combined across trials. A
common dictionary for recording the details of medication, medical
history and adverse events should be selected and used. A common
definition of the primary and secondary variables is nearly always
worthwhile and is essential for meta-analysis. The manner of
measuring key efficacy variables, the timing of assessments relative
to randomization/entry, the handling of protocol violators and
deviators, and perhaps the definition of prognostic factors should
all be kept compatible unless there are valid reasons not to do so.
Any statistical procedures used to combine data across trials
should be described in detail. Attention should be paid to the
possibility of bias associated with the selection of trials, to the
homogeneity of their results, and to the proper modeling of the
various sources of variation. The sensitivity of conclusions to the
assumptions and selections made should be explored.
7.2.1 Efficacy Data
Individual clinical trials should always be large enough to
satisfy their objectives. Additional valuable information may also
be gained by summarizing a series of clinical trials that address
essentially identical key efficacy questions. The main results of
such a set of trials should be presented in an identical form to
permit comparison, usually in tables or graphs that focus on
estimates plus confidence limits. The use of meta-analytic
techniques to combine these estimates is often a useful addition
because it allows a more precise overall estimate of the size of the
treatment effects to be generated and provides a complete and
concise summary of the results of the trials. Under exceptional
circumstances, a meta-analytic approach may also be the most
appropriate way, or the only way, of providing sufficient overall
evidence of efficacy via an overall hypothesis test. When used for
this purpose, the meta-analysis should have its own prospectively
written protocol.
7.2.2 Safety Data
In summarizing safety data, it is important to examine the
safety database thoroughly for any indications of potential toxicity
and to follow up any indications by looking for an associated
supportive pattern of observations. The combination of the safety
data from all human exposure to the drug provides an important
source of information because its larger sample size provides the
best chance of detecting the rarer adverse events and, perhaps, of
estimating their approximate incidence. However, incidence data from
this database are difficult to evaluate because of the lack of a
comparator group, and data from comparative trials are especially
valuable in overcoming this difficulty. The results from trials
which use a common comparator (placebo or specific active
comparator) should be combined and presented separately for each
comparator providing sufficient data.
All indications of potential toxicity arising from exploration
of the data should be reported. The evaluation of the reality of
these potential adverse effects should take into account the issue
of multiplicity arising from the numerous comparisons made. The
evaluation should also make appropriate use of survival analysis
methods to exploit the potential relationship of the incidence of
adverse events to duration of exposure and/or followup. The risks
associated with identified adverse effects should be appropriately
quantified to allow a proper assessment of the risk/benefit
relationship.
Annex 1 Glossary
Bayesian approaches--Approaches to data analysis that provide a
posterior probability distribution for some parameter (e.g.,
treatment effect), derived from the observed data and a prior
probability distribution for the parameter. The posterior
distribution is then used as the basis for statistical inference.
Bias (statistical and operational)--The systematic tendency of
any factorsassociated with the design, conduct, analysis and
evaluation of the results of a clinical trial to make the estimate
of a treatment effect deviate from its true value. Bias introduced
through deviations in conduct is referred to as ``operational''
bias. The other sources of bias listed above are referred to as
``statistical.''
Blind review--The checking and assessment of data during the
period of time between trial completion (the last observation on the
last subject) and the breaking of the blind, for the purpose of
finalizing the planned analysis.
Content validity--The extent to which a variable (e.g., a rating
scale) measures what it is supposed to measure.
Double dummy--A technique for retaining the blind when
administering supplies in a clinical trial, when the two treatments
cannot be made identical. Supplies are prepared for Treatment A
(active and indistinguishable placebo) and for Treatment B (active
and indistinguishable placebo). Subjects then take two sets of
treatment; either A (active) and B (placebo), or A (placebo) and B
(active).
Dropout--A subject in a clinical trial who for any reason fails
to continue in the trial until the last visit required of him/her by
the study protocol.
Equivalence trial--A trial with the primary objective of showing
that the response to two or more treatments differs by an amount
which is clinically unimportant. This is usually demonstrated by
showing that the true treatment difference is likely to lie between
a lower and an upper equivalence margin of clinically acceptable
differences.
Frequentist methods--Statistical methods, such as significance
tests and confidence intervals, which can be interpreted in terms of
the frequency of certain outcomes occurring in hypothetical repeated
realizations of the same experimental situation.
Full analysis set--The set of subjects that is as close as
possible to the ideal implied by the intention-to-treat principle.
It is derived from the set of all randomized subjects by minimal and
justified elimination of subjects.
Generalizability, generalization--The extent to which the
findings of a clinical trial can be reliably extrapolated from the
subjects who participated in the trial to a broader patient
population and a broader range of clinical settings.
Global assessment variable--A single variable, usually a scale
of ordered categorical ratings, that integrates objective variables
and the investigator's overall impression about the state or change
in state of a subject.
Independent data monitoring committee (IDMC) (data and safety
monitoring board, monitoring committee, data monitoring committee)--
An independent data monitoring committee that may be established by
the sponsor to assess at intervals the progress of a clinical trial,
the safety data, and the critical efficacy endpoints, and to
recommend to the sponsor whether to continue, modify, or stop a
trial.
Intention-to-treat principle--The principle that asserts that
the effect of a treatment policy can be best assessed by evaluating
on the basis of the intention to treat a subject (i.e., the planned
treatment regimen) rather than the actual treatment given. It has
the consequence that subjects allocated to a treatment group should
be followed up, assessed, and analyzed as members of that group
irrespective of their compliance with the planned course of
treatment.
Interaction (qualitative and quantitative)--The situation in
which a treatment contrast (e.g., difference between investigational
product and control) is dependent on another factor (e.g., center).
A quantitative interaction refers to the case where the magnitude of
the contrast differs at the different levels of the factor, whereas
for a qualitative interaction the direction of the contrast differs
for at least one level of the factor.
Interrater reliability--The property of yielding equivalent
results when used by different raters on different occasions.
Intrarater reliability--The property of yielding equivalent
results when used by the same rater on different occasions.
Interim analysis--Any analysis intended to compare treatment
arms with respect to efficacy or safety at any time prior to the
formal completion of a trial.
Meta-analysis--The formal evaluation of the quantitative
evidence from two or more
[[Page 49598]]
trials bearing on the same question. This most commonly involves the
statistical combination of summary statistics from the various
trials, but the term is sometimes also used to refer to the
combination of the raw data.
Multicenter trial--A clinical trial conducted according to a
single protocol but at more than one site and, therefore, carried
out by more than one investigator.
Noninferiority trial--A trial with the primary objective of
showing that the response to the investigational product is not
clinically inferior to a comparative agent (active or placebo
control).
Preferred and included terms--In a hierarchical medical
dictionary, for example, the World Health Organization's Adverse
Reaction Terminology (WHO-Art), the included term is the lowest
level of dictionary term to which the investigator description is
coded. The preferred term is the level of grouping of included terms
typically used in reporting frequency of occurrence. For example,
the investigator text ``Pain in the left arm'' might be coded to the
included term ``Joint pain,'' which is reported at the preferred
term level as ``Arthralgia.''
Per protocol set (valid cases, efficacy sample, evaluable
subjects sample)--The set of data generated by the subset of
subjects who complied with the protocol sufficiently to ensure that
these data would be likely to exhibit the effects of treatment
according to the underlying scientific model. Compliance covers such
considerations as exposure to treatment, availability of
measurements, and absence of major protocol violations.
Safety and tolerability--The safety of a medical product
concerns the medical risk to the subject, usually assessed in a
clinical trial by laboratory tests (including clinical chemistry and
hematology), vital signs, clinical adverse events (diseases, signs
and symptoms), and other special safety tests (e.g.,
electrocardiograms, ophthalmology). The tolerability of the medical
product represents the degree to which overt adverse effects can be
tolerated by the subject.
Statistical analysis plan--A statistical analysis plan is a
document that contains a more technical and detailed elaboration of
the principal features of the analysis described in the protocol,
and includes detailed procedures for executing the statistical
analysis of the primary and secondary variables and other data.
Superiority trial--A trial with the primary objective of showing
that the response to the investigational product is superior to a
comparative agent (active or placebo control).
Surrogate variable--A variable that provides an indirect
measurement of effect in situations where direct measurement of
clinical effect is not feasible or practical.
Treatment effect--An effect attributed to a treatment in a
clinical trial. In most clinical trials, the treatment effect of
interest is a comparison (or contrast) of two or more treatments.
Treatment emergent--An event that emerges during treatment,
having been absent pretreatment, or worsens relative to the
pretreatment state.
Trial statistician--A statistician who has a combination of
education/training and experience sufficient to implement the
principles in this guidance and who is responsible for the
statistical aspects of the trial.
Dated: September 8, 1998.
William K. Hubbard,
Associate Commissioner for Policy Coordination.
[FR Doc. 98-24754 Filed 9-15-98; 8:45 am]
BILLING CODE 4160-01-F