Source data: NYC Dept of Ed. Click for larger image. Alternate figure including error bars.
Another figure suggesting that DOE percentile rank (based on value-added score) does not correlate with teacher experience. (If it did, you would expect the peak of the distribution to move to the right for more experienced teachers and to the left for less).
If the term NYC Department of Education ever crosses your lips, there is a good chance you’ve heard about the recent hullabaloo over the public release of NYC teacher data reports (Dennis Walcott Op-Ed, NYT, WSJ). Notice I say data here, not rankings, even though that’s the headline most everywhere. Critics are right to caution about taking these rankings too far (or anywhere maybe). I personally feel that the publication of names in the manner that has occurred is unfair. But oh, free data! I was not one of the researchers who got anonymized data ahead of time, so this has been my first glance at this trove.
The question that grabbed me was this: how do the DOE performance metrics vary with teacher experience? In my first pass through the three annual data sets, I noticed that the 2007-2008 set contained more fine-grained experience groups (see chart, the later sets say 1, 2, 3 or more than 3 years), so how about based on 2007-2008 data? The answer turned out quite subtle.
For starters, I decided to look at math results only (personal choice). I removed all teachers for whom years of experience was “unknown” or those listed as “co-teaching.” I also removed any teachers whose total student count was less than 20. The remaining list does contain duplicates of individuals, since a teacher who teaches both 7th and 8th grade, say, appears once for each. But my interest was in averaging the performance numbers by experience groups anyway, so I ignored this duplication issue.
NYC DOE released both actual student gains on proficiency tests (post-score minus pre-score) and something called “value-added score”. Value-added score is important in distinguishing between student groups which were projected to do well and did well and student groups which over- or under-performed expectations. In their own words, “on the 07-08 reports, value-added = actual gain – predicted gain.” A bit more on what predicted gain means below.
Here’s the meat of the matter: you might expect that if you average over hundreds of teachers in any random group, the average gains (z-scores) will wash out to zero. But if teachers get better with experience (a reasonable assumption, no?), then maybe you would see a trend if you grouped teachers by years of experience. It turns out that in terms of actual gains for the year 2007-2008 and for multi-year aggregate data in that report year (blue and green lines), the data do support the idea that teachers get better. Although there is a suggestion that after 10 years, there may be a bit of a slump…
On the other hand, if you take the DOE value-added score (red and purple lines), there is no such indication. (Standard errors on these points are large enough that the data are consistent with a flat trend-line.)
So what does that mean? How could teacher value-added not improve with teacher experience?
There may be something going on with the DOE’s “predicted gain” estimate*. Their Teacher Data Report FAQ from 2009 does say that teacher experience is factored in when comparing teachers to peer teachers:
The “peer comparison” sections of the report are different from the “citywide comparison” sections […] the predicted gain in all peer comparison calculations takes into account the teacher’s experience overall and in that grade and subject.
I don’t think this is what’s happening here, though, and discussions of value-added elsewhere don’t mention teacher experience as a factor. This being a citywide comparison, more experienced teachers should have higher value-added scores, and the predicted gains should not take teacher experience into consideration (though parents and principals should accept that new teachers are still developing mastery). If the DOE is giving new teachers a boost by lowering their predicted gains, this is not really doing a service to anyone. Adjust for students, yes. Do not adjust expectations for the limited experience of the teacher.
Update: Based on this much more definitive report describing the details of the NYC value-added model, I understand that teacher experience is definitely a variable in the model, though the report does not say how much. This fact might obviate my comments below.
The purpose of using value-added scores as I mentioned is to equal the playing field in comparing teachers with very different student groups. But in averaging hundreds of teachers together by years of experience, the “unequal field” effect should average away. So really the actual gains and the value-added scores could have the same trend with years of experience. But they don’t.
One possibility is that 1st and 2nd year teachers are given harder teaching assignments than their more senior colleagues. There may well be seniority privilege, and that could explain how lower actual gains in this group turn into average value-added scores. But this is not a comforting way out, because then it appears that years of experience really have no effect on teacher “value-added.”
*the astute observer may notice that (as expected) actual gains are the same for 07-08 and multi-year for teachers with 1 year of experience, whereas this is not the case for value-added scores. This is due to the fact that predicted gains factored into the teacher value-added score are based on the multi-year information even though the actual gains are not.