Educational Research
Planning, Conducting, and Evaluating
Quantitative and Qualitative Research
John
W. Creswell
University of Nebraska–Lincoln
FOURTH EDITION
EXPERIMENTAL DESIGN
A n experimental design is the traditional approach to conducting
quantitative research. This chapter defines experimental research, identifi es
when you use it, assesses the key characteristics of it, and advances the steps
in conducting and evaluating this design.
By the end of this chapter, you should be able to:
◆ Defi ne
experimental research, and describe when to use it, and how it developed.
◆ Identify the key characteristics of
experiments.
◆ State the types of experimental designs.
◆ Recognize potential ethical issues in
experimental research.
◆ Describe the steps in conducting an
experiment.
◆ Evaluate the quality of an experimental
study.
What is eksperiment?
In an experiment, you test an idea (or practice or
procedure) to determine whether it infl uences an outcome or dependent variable.
You fi rst decide on an idea with which to “experiment,” assign individuals to
experience it (and have some individuals experience something different), and
then determine whether those who experienced the idea (or practice or
procedure) performed better on some outcome than those who did not experience
it. In Maria’s experiment, she tested whether the special health curriculum
changed students’ attitudes toward weapons in schools.
When Do You Use an Experiment?
You use an experiment when you want to establish
possible cause and effect between your independent and dependent variables.
This means that you attempt to control all variables that infl uence the utcome
except for the independent variable. Then, when the independent variable infl
uences the dependent variable, we can say the independent variable “caused” or
“probably caused” the dependent variable. Because experiments are controlled,
they are the best of the quantitative designs to use to establish probable
cause and effect. For example, if you compare one group that experiences a
lecture and another group that experiences discussion, you control all of the
factors that might infl uence the outcome of “high scores on a quiz.” You make
sure that personal abilities and test conditions are the same for both groups,
and you give both groups the same questions. You control for all variables that
might infl uence the outcome except for the difference in types of instruction
(lecture or discussion). You also use an experiment when you have two or more
groups to study, as in this lecture versus discussion example.
When Did Experiments Develop?
Experimental research began in the late 19th and early 20th centuries,
with psychologi-
cal experiments. By 1903, Schuyler used
experimental and control groups, and his use became so commonplace that he felt
no need to provide a rationale for them. Then in 1916, McCall advanced the idea
of randomly assigning individuals to groups ( Campbell & Stanley, 1963 ).
Authoring a major book in 1925, How to Conduct an Experiment, McCall fi rmly
established the procedure of comparing groups. In addition, by 1936, Fisher’s
book Statistical Methods for Research Workers discussed statistical procedures
useful in experiments in psychology and agriculture. In this book, Fisher
advanced the concept of randomly assigning individuals to groups before starting
an experiment. Other developments in statistical procedures at this time (e.g.,
chi-square goodness of fi t and critical values) and the testing of the signifi
cance of differences (e.g., Fisher’s 1935
The Design of Experiments) enhanced experimental research in education.
Between 1926 and 1963, fi ve sets of textbooks on statistics had undergone
multiple editions (Huberty, 1993). By 1963, Campbell and Stanley had identified
the major types of experimental designs. They specifi ed 15 different types and
evaluated each design in terms of potential threats to validity. These designs
are still popular today. Then, in 1979, Cook and Campbell elaborated on the
types of designs, expanding the discussion about validity threats. By 2002,
Shadish, Cook, and Campbell had refi ned the discussions about the major
experimental designs. These books established the basic designs, the notation,
the visual representation, the potential threats to designs, and the
statistical procedures of educational experiments.
What are key
characteristics of eksperiments
Before you consider how to conduct an experiment,
you will fi nd it helpful to under-
stand in more depth several key ideas central to
experimental research. These ideas are:
◆ Random assignment
◆ Control over
extraneous variables
◆ Manipulation of
the treatment conditions
◆ Outcome measures
◆ Group comparisons
◆ Threats to
validity
To make this discussion as applied as possible, we
will use an educational example to illustrate these ideas. A researcher seeks
to study ways to encourage adolescents to reduce or stop smoking. A high school
has an in-house program to treat individuals caught smoking on school grounds.
In this large metropolitan high school, many students smoke, and the smoking
infractions each year are numerous. Students caught take a special civics class
(all students are required to take civics anyway) in which the teacher
introduces a special unit on the health hazards of smoking. In this unit, the
teacher discusses health issues, uses images and pictures of the damaged lungs
of smokers, and has students write about
their experiences as smokers. This instructor
offers several civics classes during a semester, and we will refer to this
experimental situation as the “civics–smoking experiment.”
Random Assignment
As an experimental researcher, you will assign
individuals to groups. The most rigorous approach is to randomly assign
individuals to the treatments. Random assignment is the process of assigning
individuals at random to groups or to different groups in an experiment. The
random assignment of individuals to groups (or conditions within a group)
distinguishes a rigorous, “true” experiment from an adequate, but
less-than-rigorous, “quasi-experiment” (to be discussed later in the chapter).
You use random assignment so that any bias in the personal
characteristics of individuals in the experiment is distributed equally among
the groups. By randomization, you provide control for extraneous
characteristics of the participants that might infl uence the outcome (e.g.,
student ability, attention span, motivation). The experimental term for this
process is “equating” the groups. Equating the groups means that the researcher
randomly assigns individuals to groups and equally distributes any variability
of individuals between or among the groups or conditions in the experiment. In
practice, personal factors that participants bring to an experiment can never
be totally controlled—some bias or error will always affect the outcome of a
study. However, by systematically
distributingCHAPTER 10
Experimental Designs 297
this potential error among groups, the researcher
theoretically distributes the bias randomly. In our civics–smoking experiment,
the researcher can take the list of offender smokers in the school and randomly
assign them to one of two special civics classes. You should not confuse random
assignment with random selection. Both
are important in quantitative research, but they serve different purposes.
Quantitative researchers randomly select a sample from a population. In this
way, the sample is representative of the population and you can generalize
results obtained during the study to the population. Experiments often do not
include random selection of participants for several reasons. Participants
often are individuals who are available to take part in the experiment or who
volunteer to participate. Although random selection is important in
experiments, it may not be logistically possible. However, the most sophisticated
type of experiment involves random assignment. In the civics–smoking
experiment, you may randomly select individuals from the population of offender
smokers (especially if there are too many for the special civics classes).
However, you will most likely place all of the offenders in the special civics
classes, giving you control over random assignment rather than random
selection.
Control Over
Extraneous Variables
In randomly assigning individuals, we say that we
are controlling for extraneous variables that might infl uence the relationship
between the new practice (e.g., discussions on health hazards) and the outcome
(e.g., frequency of smoking). Extraneous factorsare any infl uences in the
selection of participants, the procedures, the statistics, or the design likely
to affect the outcome and provide an alternative explanation for our results
than what we expected. All experiments have some random error (where the scores
do not refl ect the “true” scores of the population) that you cannot control,
but you can try to control extraneous factors as much as possible. Random
assignment is a decision made by the investigator before the experiment begins.
Other control procedures you can use both before and during the experiment are
pretests, covariates, matching of participants, homogeneous samples, and
blocking variables.
Manipulating
Treatment Conditions
Once you select participants, you randomly assign
them to either a treatment condition or the experimental group. In experimental
treatment, the researcher physically intervenes to alter the conditions
experienced by the experimental unit (e.g., a reward for good spelling
performance or a special type of classroom instruction, such as smallgroup
discussion). In our high school example, the researcher would manipulate one
form of instruction in the special civics class—providing activities on the
health hazards of smoking.
Specifi cally, the procedure would be: Identify a
treatment variable: type of classroom
instruction in the civics class Identify the conditions (or levels) of the
variable: classroom instruction can be
(a) regular topics or (b) topics related to the health hazards of smoking
Manipulate the treatment conditions:
provide special activities on health hazards of smoking to one class and
withhold them from another class CHAPTER 10
Experimental Designs 301
These procedures introduce several new concepts
that we will discuss using specifi cexamples so that you can see how they work.
Treatment
Variables
In experiments, you need to focus on the
independent variables. These variables influ ence or affect the dependent
variables in a quantitative study. The two major types of independent variables
were treatment and measured variables. In experiments, treatment variables are
independent variables that the researcher manipulates to determine their effect
on the outcome, or dependent variable. Treatment variables are categorical
variables measured using categorical scales. For example, treatment independent
variables used in educational experiments might be:
◆ Type of
instruction (small group, large group)
◆ Type of reading
group (phonics readers, whole-language readers)
Conditions
In both of these examples, we have two categories
within each treatment variable. In experiments, treatment variables need to
have two or more categories, or levels. In an experiment, levels are categories
of a treatment variable. For example, you might divide type of instruction into
(a) standard civics lecture, (b) standard civics lecture plus discussion about
health hazards, and (c) standard civics lecture plus discussion about health
hazards and slides of damaged lungs. In this example, we have a three-level
treatment variable.
Intervening
in the Treatment Conditions
The experimental researcher manipulates one or more
of the treatment variable conditions. In other words, in an experiment, the
researcher physically intervenes (or manipulates with interventions) in one or
more condition so that individuals experience something different in the
experimental conditions than in the control conditions. This means that to
conduct an experiment, you need to be able to manipulate at least one condition
of an independent variable. It is easy to identify some situations in which you
might measure an independent variable and obtain categorical data but not be
able to manipulate one of the conditions. As shown in Figure 10.3, the
researcher mea sures three independent variables—age, gender, and type of
instruction—but only type of instruction (more specifi cally, two conditions
within it) is manipulated. The treatment variable—type of instruction—is a
categorical variable with three conditions (or levels).
Some students can receive a lecture—the traditional
form of instruction in the class (the control group). Others receive something
new, such as a lecture plus the health-hazards discussion (a comparison group)
or lecture plus the health-hazards discussion plus slides of lungs damaged by
smoking (another comparison group). In summary, experimental researchers
manipulate or intervene with one or more conditions of a treatment
variable.
Outcome
Measures
In all experimental situations, you assess whether
a treatment condition infl uences an outcome or dependent variable, such as a
reduced rate of smoking or achievement on tests. In experiments, the outcome
(or response, criterion, or posttest) is the dependent variable that is
the presumed effect of the treatment variable. It is also the effect predicted
in a hypothesis in the cause-and-effect equation. Examples of dependent
variables in experiments might be:
◆ Achievement scores
on a criterion-referenced test
◆ Test scores on an
aptitude test
The Experimental Manipulation of a Treatment
Condition
|
Independent variabels
|
dependent variabels
|
|
1. Age (cannot manipulate)
2. Gender
(cannot manipulate)
3. Types
of instruction (can manipulate)
a. Some receive lecture (control)
b. Some receive lecture plus health-hazard
discussion (comparison)
c. Some receive lecture plus health-hazard
discussion plus slides of lungs damaged by smoking (experimental)
|
Frekuensy of smoking
|
Good outcome measures are sensitive to treatments
in that they respond to the smallest amount of intervention. Outcome measures
(as well as treatment variables) also need to be valid so that experimental
researchers can draw valid inferences from them.
Group
Comparisons
In an experiment, you also compare scores for
different treatments on an outcome. A
group comparison is the process
of a researcher obtaining scores for individuals or groups on the dependent
variable and comparing the means and variance both within the group and between
the groups. (See Keppel [1991] for detailed statistical procedures for this
process.) To visualize this process, let’s consider some actual data from an
experiment by Gettinger (1993), who sought to determine the effects of an error
correction procedure on the spelling of third graders. As shown in Figure 10.4,
we visualize Gettinger’s experiment in three ways.
Gettinger examined whether the error correction
procedure related positively to spelling accuracy (Phase 1). She then created
three groups of students: Class A, Class B, and Class C. Class A (the control
group) received regular spelling practice on 15 words, consisting of workbook
exercises, writing sentences containing each word, and studying words on their
own. Class B (the comparison group) had the same experience except that they
studied a reduced number of words on a list—three sets of fi ve words each.
Class C (the experimental group) used an error-and-correction practice
procedure consisting of correcting their own tests, noting incorrect words, and
writing both the
incorrect and correct spelling for each word. As shown
in Phase 2, all three groups received the same spelling practice for 6 weeks,
then the experimental group received the error correction procedure for 6
weeks, and after a third 6 weeks, all three groups were tested. Phase 3 shows
the statistical comparisons made among the three groups on each of the three
tests. Class A improved slightly (from 10.3 on Test 1 to 11.1 on Test 3), whereas
Class B’s scores decreased over the three tests. Class C, the experimental
group, improved considerably. F-test values showed that the scores varied
signifi cantly on Test 2 and Test 3 when the researcher compared the groups.
These statistical comparisons took into consideration both the mean scores and
the variation between and within each group to arrive at
Threats to
Validity
A fi nal idea in experiments is to design them so
that the inferences you draw are true or correct. Threats to drawing these
correct inferences need to be addressed in experimental
research. Threats to validity refer to specifi c
reasons for why we can be wrong when we make an inference in an experiment
because of covariance, causation constructs, or whether the causal relationship
holds over variations in persons, setting, treatments, and outcomes ( Shadish,
Cook, & Campbell, 2002 ). Four types of validity they discuss are:
◆Statistical
conclusion validity, which refers to the appropriate use of statistics (e.g., violating
statistical assumptions, restricted range on a variable, low power) to infer
whether the presumed independent and dependent variables covary in the experiment.
◆ Construct
validity, which means the validity of inferences about the constructs (or variables)
in the study.
◆ Internal validity,
which relates to the validity of inferences drawn about the cause and effect
relationship between the independent and dependent variables.
◆ External validity,
which refers to the validity of the cause-and-effect relationship being
generalizable to other persons, settings, treatment variables, and measures.
304 PART III Research Designs These
threats to validity have evolved over the years from the initial discussions by
Campbell and Stanley (1963) , to the elaboration of their use by Cook and
Campbell (1979) , and more recently by Shadish, Cook, and Campbell (2002) . The
basic ideas are still intact, but more recent discussions have elaborated on
the points. Our discussion here will focus on the two primary threats to
consider: internal validity and external validity.
Between-Group
Designs
The most frequently used designs in education are
those where the researcher compares two or more groups. Illustrations
throughout this chapter underscore the importance of these designs. We will
begin with the most rigorous between-group design available to the educational
researcher, the true experiment.
True Experiments
True experiments comprise the most rigorous and
strong experimental designs because of equating the groups through random
assignment. The procedure for conducting major forms of true experiments and
quasi-experiments, viewing them in terms of activities from the beginning of
the experiment to the end, is shown in Table
10.3. In true experiments, the researcher randomly assigns participants
to different conditions of the experimental variable. Individuals in the
experimental group receive the experimental treatment, whereas those in the
control group do not. After investigators administer the treatment, they
compile average (or mean) scores on a posttest. One variation on this design is
to obtain pretest as well as posttest measures or observations. When experimenters
collect pretest scores, they may compare net scores (the differences between the
pre- and posttests). Alternatively, investigators may relate the pretest scores
for the control and experimental groups to see if they are statistically
similar, and then compare the two posttest group scores. In many experiments,
the pretest is a covariate and is statistically controlled by the
researcher. Because you randomly assign
individuals to the groups, most of the threats to internal validity do not
arise. Randomization or equating of the groups minimizes the possibility of
history, maturation, selection, and the interactions between selection and
other threats. Treatment threats such as diffusion, rivalry, resentful
demoralization, and compensatory equalization are all possibilities in a
between-group design because two or more groups exist in the design. When true
experiments include only a posttest, it reduces the threats of testing,
instrumentation, and regression because you do not use a pretest. If a pretest is
used, it introduces all of these factors as possible threats to validity.
Instrumentation exists as a potential threat in most experiments, but if
researchers use the same or similar instrument for the pre- and posttest or
enact standard procedures during the study, you hold instrumentation threats to
a minimum.
Quasi-Experiments
In education, many experimental situations occur in
which researchers need to use intact groups. This might happen because of the
availability of the participants or because the setting prohibits forming artifi
cial groups. Quasi-experiments include
assignment, but not random assignment of participants to groups. This is
because the experimenter cannot artifi cially create groups for the experiment.
For example, studying a new math program may require using existing
fourth-grade classes and designating one as the experimental group and one as
the control group. Randomly assigning students to the two groups would disrupt
classroom learning. Because educators often use intact groups 310 PART III Research Designs (schools, colleges, or
school districts) in experiments, quasi-experimental designs are frequently
used.
Returning to
Table 10.3, we can apply the pre-
and posttest design approach to a quasi-experimental design. The researcher
assigns intact groups the experimental and control treatments, administers a
pretest to both groups, conducts experimental treatment activities with the
experimental group only, and then administers a posttest to assess the differences
between the two groups. A variation on this approach, similar to the true experiment,
uses only a posttest in the design. The quasi-experimental approach introduces
considerably more threats to internal validity than the true experiment.
Because the investigator does not randomly assign participants to groups, the
potential threats of maturation, selection, mortality, and the interaction of
selection with other threats are possibilities. Individuals assigned to the two
groups may have selection factors that go uncontrolled in the experiment.