"An Urgent Need for Short-Cycle Clinical Trials in Education"

Over at the Brookings blog, Tom Kane writes:

The urgent need for the crisp evidence provided by randomized trials cannot be underestimated.  Strangely, education leaders speak about “effective professional development” in the same way that foreign policy experts speak of Middle East peace:  everyone says it’s essential, but no one seems to believe it can work!  That’s a problem, because it’s impossible to rally a groundswell of support unless you believe (and can demonstrate) your strategy will work.

Read the whole thing here

Three thoughts:

1. Crisp evidence is one reason Boston charters are growing. 

Back in 2009, Tom Kane and MIT's Josh Angrist and others did a study of Boston charter schools and Boston pilot schools.  Remember, the typical US charter school is just okay.  The study showed Boston's charters were unusually good -- compared to pilots and district schools in that study, but ultimately compared to other charters in the USA if you extrapolated the data. 

This study didn't much influence the stalwart charter opponents or advocates very much.  They're dug in.  But the study did seem to have an effect on various moderates, including writers for the Boston Globe editorial page.  I'll be talking about this study and its effect next month in Tom's class.

That study was later validated by the Stanford CREDO study.  An interesting note: similarly crisp evidence shows that while Boston charters help kids achieve unusually large academic growth, non-Boston Massachusetts charters do not. 

2. I have called for "short-cycle clinical trials" too.  Here is my essay from Education Next.  The difference b/w Tom and me, besides many obvious things, like that he is a leading scholar and I am not? 

Tom writes:

Over the last decade, the U.S. Department of Education has built a robust infrastructure for evaluating programs at the federal level.  The National Center for Education Evaluation has funded more than 34 large-scale impact evaluations since 2002.  These cost an average of $12 million and have taken 5.6 years to complete.  Such a system might be sufficient in countries where education policy decisions are made at the federal level and where there is greater continuity of leadership. 

Unfortunately, state and local decision-makers—who play the critical role in the U.S. system—too rarely make the connection between the lessons learned in federally-funded evaluations and their own policy decisions.  We need a new model to supplement these federal efforts, which is faster, less expensive and more closely tied to the decisions being made at the state and local level.

Faster?  Yes.  Less expensive?  Yes.  Measuring decisions made at state level?  Here I depart.  I feel the more urgent need is to analyze decisions made at the individual teacher level. 

I wrote:

One IES project is the What Works Clearinghouse (WWC), established in 2002 to provide “a central and trusted source of scientific evidence for what works in education.” The WWC web site lists topic areas like beginning reading, adolescent literacy, high school math, and the like. For each topic, WWC researchers summarize and evaluate the rigor of published studies of products and interventions. One might find on the WWC site evidence on the relative effectiveness of middle-school math curricula or of strategies to encourage girls in science, for example.

But there is almost nothing examining the thousands of moves teachers must decide on and execute every school day. Should I ask for raised hands, or cold-call? Should I give a warning or a detention? Do I require this student to attend my afterschool help session, or make it optional? Should I spend 10 minutes grading each five-paragraph essay, 20 minutes, or just not pay attention to time and work on each until it “feels” done?

And the WWC’s few reviews of research on teacher moves aren’t particularly helpful. A 63-page brief on the best teaching techniques identifies precisely two with “strong evidence”: giving lots of quizzes and asking deep questions. An 87-page guide on reducing misbehavior has five areas of general advice that “research supports,” but no concrete moves for teachers to implement. It reads, “[Teachers should] consider parents, school personnel, and behavioral experts as allies who can provide new insights, strategies, and support.” What does not exist are experiments with results like this: “A randomized trial found that a home visit prior to the beginning of a school year, combined with phone calls to parents within 5 hours of an infraction, results in a 15 percent drop in the same misbehavior on the next day.” If that existed, perhaps teachers would be more amenable to proposals like home visits.

3. Match Education tries to advance knowledge where possible with "crisp evidence." 

A downside of that approach is it adds cost, and it can be weird to start a conversation with a potential partner by saying "We don't know for sure if this works, and we genuinely want to find out." 

But our teacher coaching work in New Orleans, thanks to an understanding partner in NSNO, is organized as a randomized trial. 

And our tutoring work with Chicago Public Schools, in partnership with U of Chicago, is set up in a similar manner. 

So research-practitioner collaboration can happen.  It's just way too rare.  And frankly, it's scary.  For example, we're currently seeking some big funding to do an RCT on our Graduate School of Education.  We know that Match-trained teachers end up, as rookie teachers, with kids who make unusually large learning gains.  But why?  Is it our selection?  Is it our training?  Or is it the schools which hire them, which themselves have a strong track record, entirely independently of us?  Hard to say.  We'd hope such a study would validate all our efforts.  But it could do the opposite. 

4. I'll give Tom the last word here:

The Common Core standards and new teacher evaluation policies are a good example.  Although the federal government helped create the policy framework, its ultimate impact will be determined by thousands of implementation and policy decisions at the state and local level.  So, how could we ensure that state and local leaders get the evidence they will need to find the best solutions?

Here’s an outline of one approach, which a group of states and large districts could undertake collectively:

Suppose a group of states were to invite panels of teachers to assemble packages of materials targeted at the most demanding new standards in each grade and subject.  Each package should contain a training component and a feedback component.  For instance, in addition to receiving training and curriculum materials, teachers might be given cameras to submit videos of their lessons teaching the new standards, for comment from peers, principals and content experts.  (Implementing the new standards will require massive adult behavior change.  And any adult behavior change requires feedback.  Postponing the implementation of teacher evaluations would be like launching a Weight Watchers program with no bathroom scales or mirrors for the participants.)

Teams of teachers by grade level and subject in schools would be invited to participate in the trials, from all the participating states.

The states would find a partner to organize the trials to test the packages.  From the volunteer schools, a subset would be chosen by lottery to receive the treatments in specific grade levels and subjects.  (A school might win the lottery in one grade and subject, but not in others.  They would serve as control group schools in the grades and subjects where they were not chosen for treatment.) Randomly assigning treatments at the school/grade level would eliminate the need to analyze student-level data. Assembling and cleaning student-level data accounts for much of the cost and delay in traditional evaluations.

The research would piggy-back on federal data reporting requirements (using school-level and subgroup means by grade and subject rather than student-level data).  That way, the tables could be prepared beforehand and impact estimates could be produced within days of state reporting during the summer following each school year.