Updated: 10/9/12 @10:41GMT+5
First of all, while I am NOT an expert statistician, I do know more
than the average person and have had formal training/education in the
subject. (I've had some undergraduate and graduate stats classes, used
statistics professionally & throughout undergrad, etc.) So I would
love to have a "pure" (read: non-education related) statistician go over
the data, but I think I have made a decent start here.
If
there's one thing I hate hearing any more in education it is "data"
because most of the data they have is crap and/or they have no idea what
to do with it. In the case of Marzano, I often hear "him" (his company
& his salespeople) going on about how much research he has. It's
true,
his database
has 1036 studies in it (I copy & pasted these into a text file,
then imported into Excel as a "delimited" file). But let's take a
closer look, because as just about everyone with any connection to the
professoriate knows, the quality of educational research is (often extremely)
suspect. This is largely do to the difficulties of doing research on
kids (particularly longitudinal research) but there are
other problems as well.
If you sort the data by
p-values, you quickly start to see some problems. Marzano's own website declares,
"
Basically,
if the value in this column is less than .05, the effect size reported
for the study can be considered statistically significant at a
significance level of 5% (α = .05). In other words, a reasonable
inference can be made that the reported effect size is probably not a
function of random factors; rather, the reported effect size represents a
real change in student learning."
So sort the data
by the p-value, delete those that are greater than 0.05 and look what happens: you're down to 285 studies from
the initial 1036. That means that by his own criteria,
only 27.5% of his data is statistically significant (to α=0.05).
(Or, taken another way: There is a significant probability that the results weren't really results, but random fluctuation between the controls and experiments.
And that's true for nearly three quarters of the data.) Of the remaining 285 studies, 101 of them have a p-value of zero; I
assume this means that either it wasn't reported OR it was such a great
experiment that they were able to calculate the p-value down to less
than 0.001. The latter is unrealistic. (For example, one study has
just 4 data points and an "
effect size" of 9.25, which is grossly
unrealistic. I don't see how any self-respecting statistician could use or report this, to be blunt.) So incomplete/unrealistic data in my book gets thrown
out--we're down to 185 studies (
17.9% of his database).
We're
not done yet. Here are a few other data points I'm going to throw out
because I find them too suspect to be reliable for district-wide
policy-setting decisions: Studies involving less than 18 students (n):
79. Admittedly, this is somewhat arbitrary, but I could probably
defend their exclusion* far better than anyone could defend their
inclusion. That brings us down to 106 (10.2% of the total) studies.
I'm going to stop there, but notice that there are also 16 studies that
are incomplete; they have no unit length. Another 21 studies lasted
less than a week. Two studies have controls of less than 10 students,
which "seems" too low (one is 4, the other 9). So even this remaining
10% is somewhat dubious. But it's not the ten percent that really
bothers me, it's the 90%, because Marzano's work--which sadly is
influencing policy--is based on all of this bad data. One other quick question: How much time was spent on each of these 1000+ studies? (I'm guessing not a lot; see below.)
In other words, the policy is based on research and the research relies on unreliable data.
So does it surprise anyone that all of these new policies only seem to make things worse?
Also worth reading: "
Marzano - A Successful Fraud", a review of Marzano et al's
Classroom Instruction that Works...
To quote the Amazon.com review:
A. Every single reference I checked was itself dubious or misrepresented by the authors.
B. Some of the references were on topics unrelated to the instructional strategies cited.
B. [sic] Some of the numbers from published data were altered to better conform to the author's point of view.
C.
Some of the references themselves presented provisional conclusions
based on weak results, but were given complete credence by Marzano et
al.
D. The authors took weak data from several studies, each based on
averaging the results from studies assumed to use similar methods and
subject cohorts, and averaged these, compounding the statistical
weaknesses. This is especially shocking given that no credible
researcher would combine results from studies by different groups that
clearly use different methodologies and subject cohorts.
*
My rationale for excluding studies of < 18 students: This is less
than a typical classroom and more importantly, probably less than
necessary for reliable statistical analysis (I was always told to use at
least 30 data points, but that was in a field--Science/Engineering--that has much more
rigorous standards than the social sciences, let alone education).