Jonah Rockoff #2

This will be a very technical post. Many teacher prep programs are wondering: on what basis might we deny a license or degree to a teacher after one year of performance? What about after two years?

Obviously any system will have the benefit of kicking out some bad teachers, but will certainly have some "false positives" - teachers who are actually "good" but don't appear to be based on the tests.

This isn't unique to education, obviously. You can only avoid this problem if you never fire anyone.

I discussed this issue as Point #5 in a post entitled Jonah Rockoff. Here's what I wrote:

Let’s say you hired 400 teachers for your school district. And at the end of the year, you fired the 100 rookie teachers in the bottom quartile.

Among the fired, you’d have 15 people who would have been above-average during Year 2, and 85 who would have been below average. Economists would say that is a “win” for kids in the sense that the “Replacement 100″ is going to generate a better average teacher. But if you’re one of those 15 teachers, it’s not fair — you were a false-negative.

We need Jonah to run this data for bottom tenth of rookie teachers versus bottom quartile. Because some large teacher prep programs are thinking about making a big push to cut out the lowest tenth.

Jonah emailed me the answer this morning:

Below I’ve drawn up some statistics for simulated data made to match the actual data in LA and NYC based on the signal-noise ratios found in those cities’ data. In other words, in both NYC and LA data, a little less than 1/2 of the variation in a classroom’s test score residuals is persistent teacher value-added and a little more than 1/2 is noise.

Given these parameters, we can create a dataset in which each observation (teachers) is assigned a “true” value added and, for any given year, the measured value-added for this observation (teacher) equals the truth plus noise.

For each decile of “measured VA”, we can calculate the percent of teachers in each decile of “true VA”. A teacher’s measured VA in a future year will be centered on their true value added but will have noise, so these tabulations are not indicative of the year-to-year stability in measured value-added but rather the predictive power of value-added for discerning a teacher’s expected impact on kids over the long-term.

First I look at just one year of measured performance. A teacher in the lowest decile of performance in a given (single) year has close to a 10% chance of being in the top 1/2 in true value-added.

| decile_true decile1 | 0 1 2 3 4 5 6 7 8 9 | -----------+--------------------------------------------------------------------------------------------------------------+-- 0 | 40.80 21.02 13.73 9.12 6.18 4.15 2.65 1.47 0.70 0.17 | 1 | 21.39 19.23 16.03 12.87 10.23 7.90 5.73 3.79 2.15 0.69 | 2 | 13.58 16.03 15.08 13.84 12.02 10.15 8.04 5.97 3.84 1.46 | 3 | 9.07 13.04 13.78 13.60 12.70 11.33 10.10 7.97 5.79 2.63 | 4 | 6.24 10.33 11.94 12.76 12.80 12.41 11.56 9.99 7.86 4.11 | 5 | 4.04 7.89 10.01 11.50 12.40 12.81 12.75 11.94 10.33 6.34 | 6 | 2.57 5.75 8.05 9.98 11.48 12.77 13.54 13.63 13.07 9.14 | 7 | 1.47 3.91 6.06 8.09 10.09 11.99 13.73 15.17 15.98 13.52 | 8 | 0.67 2.12 3.88 5.64 7.93 10.15 12.78 16.18 19.28 21.37 | 9 | 0.17 0.67 1.45 2.60 4.18 6.34 9.13 13.89 21.00 40.56 | -----------+--------------------------------------------------------------------------------------------------------------+--

If we use more data, we get more persistence. Those in the bottom decile of Value-Added measured over two years have less than a 4% chance of being in the top half of true value-added.

| decile_true decile12 | 0 1 2 3 4 5 6 7 8 9 | -----------+--------------------------------------------------------------------------------------------------------------+--- 0 | 51.08 22.67 12.24 6.87 3.79 1.93 0.92 0.38 0.11 0.01 | 1 | 22.82 23.72 18.56 13.49 9.29 6.14 3.54 1.75 0.59 0.11 | 2 | 12.30 18.50 18.16 15.94 12.81 9.64 6.58 3.87 1.84 0.36 | 3 | 6.78 13.63 15.82 15.91 14.68 12.42 9.76 6.55 3.54 0.92 | 4 | 3.72 9.30 12.91 14.45 14.91 14.39 12.72 9.58 6.13 1.88 | 5 | 1.90 6.20 9.73 12.43 14.50 15.00 14.58 12.68 9.25 3.72 | 6 | 0.91 3.47 6.52 9.76 12.39 14.57 16.09 15.81 13.62 6.88 | 7 | 0.37 1.77 3.97 6.69 9.64 12.81 15.73 18.28 18.56 12.19 | 8 | 0.11 0.65 1.75 3.52 6.07 9.38 13.43 18.61 23.54 22.95 | 9 | 0.01 0.10 0.34 0.93 1.92 3.73 6.66 12.50 22.81 51.00 | -----------+----------------------------------------------------------------------------------

So there you are.

If you axe the bottom 10th of your teachers based on one year of data, you'll be right 90% of the time, and wrong (defined in this context as they are really in the top half of your teachers) 10% of the time.

If you axe the bottom 10th of your teachers based on two years of data, you'll be right 96% of the time.