Revolution Lullabye

August 27, 2014

Newton, Value-added Modeling of Teacher Effectiveness

Newton, Xiaoxia A, et al. “Value-added Modeling of Teacher Effectiveness: Exploration of Stability across Models and Contexts.” Educational Policy Analysis Archives, 18 (23). 2010. Print.

Newton et al investigate measures of teacher effectiveness based on VAM (value-added modeling) to show that these measures, based on in large part on measured student learning gains, are not stable and can vary significantly across years, classes, and contexts. The study focused on 250 mathematics and ELA teachers and approximately 3500 students they taught at six high schools in the San Francisco Bay Area. The researchers argue that measures of teacher effectiveness based solely on student performance scores (those measures that don’t take into account student demographics and other differences) cannot be relied on to get a true understanding of a teacher’s effectiveness because so many other unstable variables impact those student test scores. Models of teacher evaluation that rely heavily on student performance scores can negatively impact teachers who teach in high-need areas. This is especially true with teachers who teach disadvantaged students or students with limited English proficiency.

Quotable Quotes

“Growing interest in tying student learning to educational accountability has stimulated unprecedented efforts to use high-stakes tests in the evaluation of individual teachers and schools. In the current policy climate, pupil learning is increasingly conceptualized as standardized test score gains, and methods to assess teacher effectiveness are increasingly grounded in what is broadly called value-added analysis. The inferences about individual teacher effects many policymakers would like to draw from such value-added analyses rely on very strong and often untestable statistical assumptions about the roles of schools, multiple teachers, student aptitudes and efforts, homes and families in producing measured student learning gains. These inferences also depend on sometimes problematic conceptualizations of learning embodied in assessments used to evaluate gains. Despite the statistical and measurement challenges, value-added models for estimating teacher effects have gained increasing attention among policy makers due to their conceptual and methodological appeal” (3).

Differences in teacher effectiveness in different classes: “An implicit assumption in the value-added literature is that measured teacher effects are stable across courses and time. Previous studies have found that this assumption is not generally met for estimates across different years. There has been less attention to the question of teacher effects across courses. One might expect that teacher effects could vary across courses for any number of reasons. For instance, a mathematics teacher might be better at teaching algebra than geometry, or an English teacher might be better at teaching literature than composition. Teachers may also be differentially adept at teaching new English learners, for example, or 2nd graders rather than 5th graders. It is also possible that, since tracking practices are common, especially at the secondary level, different classes might imply different student compositions, which can impact a teacher’s value-added rankings, as we saw in the previous section.” (12)

“the analyses suggested that teachers’ rankings were higher for courses with “high-track” students than for untracked classes” (13).

“These examples and our general findings highlight the challenge inherent in developing a value-added model that adequately captures teacher effectiveness, when teacher effectiveness itself is a variable with high levels of instability across contexts (i.e., types of courses, types of students, and year) as well as statistical models that make different assumptions about what exogenous influences should be controlled. Further, the contexts associated with instability are themselves highly relevant to the notion of teacher effectiveness” (16).

“The default assumption in the value-added literature is that teacher effects are a fixed construct that is independent of the context of teaching (e.g., types of courses, student demographic compositions in a class, and so on) and stable across time. Our empirical exploration of teacher effectiveness rankings across different courses and years suggested that this assumption is not consistent with reality. In particular, the fact that an individual student’s learning gain is heavily dependent upon who else is in his or her class, apart from the teacher, raises questions about our ability to isolate a teacher’s effect on an individual student’s learning, no matter how sophisticated the statistical model might be” (18).

“Our correlations indicate that even in the most complex models, a substantial portion of the variation in teacher rankings is attributable to selected student characteristics, which is troubling given the momentum gathering around VAM as a policy proposal. Even more troubling is the possibility that policies that rely primarily on student test score gains to evaluate teachers – especially when student characteristics are not taken into account at all (as in some widely used models) — could create disincentives for teachers to want to work with those students with the greatest needs” (18).

“Our conclusion is NOT that teachers do not matter. Rather, our findings suggest that we simply cannot measure precisely how much individual teachers contribute to student learning, given the other factors involved in the learning process, the current limitations of tests and methods, and the current state of our educational system” (20). 

Notable Notes

The problem of variables impacting the calculation of teacher effectiveness: the students’ background (socioeconomic, cultural, disability, language diversity), the effects of the school environment, how teachers perform year-to-year, the curriculum

VAM makes assumptions that schools, teachers, students, parents, curriculum, class sizes, school resources, and communities are similar.

The variables the researchers collected and measured included CST math or ELA scaled test scores, students’ prior test scores for both average and accelerated students, students’ race/ethnicity, gender, and ELL status, students’ parents’ educational level and participation in free or reduced school lunch, and individual school differences. Tries to look at the issue longitudinally by looking at student prior achievement (7). They were able to link students to teachers (8).


Blog at

%d bloggers like this: