MoneyLaw: Student Evaluations

Thursday, February 01, 2007

Student Evaluations

The focus on Moneylaw is usually scholarship and for good reason. At least with scholarship there are things to count whether downloads, citations, or number of lines on a resume. There is little discussion of teaching although, but for the teaching function,we would not have jobs. The lack of discussion I think can be traced to the absence of any reliable way to measure teaching effectiveness. It is a credit to (although not much of one) contributors or commentators to Moneylaw that no one has suggested that student evaluations are measures of teaching effectiveness. It is common knowledge that high evaluations can the be result of excellent teaching but can also be the result of easy teaching, funny teaching, weenie roasts and, I suspect, some other things that have not occurred to me.

Could this issue be addressed? I have not kept up with the literature but I recall efforts in undergraduate math courses to determine the relationship between student evaluations and teaching effectiveness. The teachers all used the same book and gave the same exam. In one study, after holding constant for as many variables as possible, there was a negative correlation between exam scores and the students’ evaluations of the teachers. When I taught economics we did the same thing in a principles course. We taught the same chapters and gave a common exam. In our case there was no correlation although there was a great range in how the students ranked the teachers.

Suppose a school had 4 contracts sections. The professors could agree on the same book and the same coverage and devise a common exam which they also all agreed to grade. (The grade for the experiment could be the average grade from the 4 professors.) With a total of 400 students, you would have 400 evaluations of the teachers and 400 final exam scores. LSAT, GPA, etc., could all be factor out so the focus would be on how much the students “learned” and what the students thought of the professor.

This would be terribly time consuming and there may be some statistical issues to iron out. Plus, I am not convinced that the exam – as opposed to what happens five years from now – is a great indicator of teaching effectiveness. Still, isn’t it about time that someone in our profession took a close look at student evaluations to determine if they tell us anything useful and, perhaps, to determine whether they are actually a disincentive to teaching effectively.

9 Comments:

Anonymous said...: Jeff,

Are you taking gender and race effects into account when you discuss teaching evaluations? There's been much literature, including statistical analysis, which talks about how women and minorities routinely receive lower scores on teaching evaluations, controlling for other variables. This is particularly true when the professors are junior faculty, no matter what field.; 2/01/2007 1:46 PM
Jeffrey Harrison said...: Laura: I did not know that and it presents yet another complication is determining if evals and effectiveness are related.

Bill: I agree with virtually everything you have said. I have some reservation about whether really high evals are evidence of good teaching. I think low ones though are indications that something is amiss.

As you suggest we need to avoid relying on anecdotes but here is one anyway. A couple of years ago some students were angry about the possibility that classroom laptop use would be limited. The started a blog: Angry Gator. In the course of that an number of them complained about poor teaching generally. Someone responded by noting that our evals are public and that most professors are ranked as good to excellent. The response from at least some of the blogger was that the student tended to give high evals as part of the "deal" -- don't make my life difficult and I will take care of you on the evals. It did not mean that they thought the teaching was actually good.

I think it is possible the evals do more harm than good.; 2/01/2007 3:11 PM
Anonymous said...: In the math class case, was there any attempt (or any way to attempt) to tell if the low evaluations were correlated with bad grades because students don't like bad grades and the teacher in question was harder, or if, rather, the students got bad grades because the teacher was bad?; 2/01/2007 5:05 PM
Jeffrey Harrison said...: Perhaps I misstated this. In the math experiment, the students with high grades rated their teacher low. Those with low grades, like the teachers. The explanation was that the effective teachers were more rigorous in class and gave more homework.; 2/01/2007 5:38 PM
Anonymous said...: Assessing teacher effectiveness can be a vexing problem and this difficulty may be one factor that leads many profs to discount it in favor of very tangible (and utility maximizing) scholarship efforts. Further, even if someone really loves teaching, it is possible that what they love about teaching may not be the same as what the next person considers good teaching. People often have very different ideas about what constitutes good teaching.

For some, good teaching might mean students do better on a common test (e.g. the bar exam). However, this can lead to "teaching to the exam" in certain situations in which such common measures are frequent. It is my understanding that this is a problem in K-12, where standardized tests are the rage. This type of learning may provide incentives to stay away from the aspects of teaching that aren't geared toward exam taking - such as practice aspects or the cutting edge of law questions.

It may be that one improvement to student evaluations could be found in the survey instrument. Researchers who use such methods frequently have come up with much better ways of getting to "the truth" than the simplistic questions used in the standard student evaluations.; 2/01/2007 7:30 PM
Anonymous said...: Jeff:

I’m surprised to hear you state with confidence that research effectiveness is easily, or more easily, measurable, given the prevailing critiques of law review editors and your trenchant criticisms of the tenure evaluation process. If the editorial process is run by non-professionals, and the professional evaluation process is as deeply flawed as you’ve intimated, I’m not certain how well the legal academy can evaluate research, especially when certain fields of doctrinal and interdisciplinary scholarship are so specialized that a colleague who works in one area can’t be expected to evaluate a tenure candidate’s scholarship. Outside of the very best and the very worst research, I’m not so confident that the academy does a perfect or even great job of evaluating research, given the biases attached to law review placement, flashy glibness, and certain subject matters over others.

As to teaching evaluations, even if I agree with you that student evaluations are significantly less than perfect, I agree with Bill Henderson that for a variety of reasons, law students’ evaluations are more trustworthy than undergraduates’ (and, having taught both types of students and with a partner who teaches undergrads, I think Bill’s correct). Moreover, your use of the Angry Gator folks as reliable narrators about their own predicament doesn’t really help your case (but that’s a local story too complicated to repeat here).

Assuming, then, that student evaluations have some value, the more significant question is the extent to which an institution should care about student evaluations. For instance, imagine we could come up with some reliable measurement of “teaching effectiveness,” and suppose that Prof. A and Prof. B end up with the same score for their identical classes, call it an 8 on an ascending scale of 1 to 10. And suppose, too, that Prof. A gets student evaluations that total an 8 and Prof. B with student evaluations that total a 3. I would argue, uncontroversially I think, that the institution should favor Prof. A, and not just for the market-driven reasons that Prof. A is able to sell herself better as an effective teacher (even if she’s not actually more effective than B). A’s “popularity” can have significant positive externalities for the institution and the profession insofar as it may increase overall student satisfaction in the institution and perhaps have longitudinal effects on alumni giving and loyalty, and it may perhaps even inspire students to enjoy practicing law more than B’s students.

If this is true, and if we ideally would prefer the “effective” and “popular” teacher over teachers who are only “effective” or “popular” (or, of course, neither), and if we have reasons to distrust our ability to measure effectiveness, then where does an institution draw a line? What increment of increased effectiveness are we willing to purchase by giving up what increment of decreased popularity (and vice versa)? Are we willing to still prefer A to B if A’s effectiveness declines a little but their relative popularity remains constant? If so, at what rate of marginal decline should A’s lack of effectiveness become unacceptable, especially if we begin to doubt the effectiveness measurement? This strikes me as indeterminate.

Perhaps the takeaway is that although there are plenty of reasons to distrust student evaluations, it would be foolish for an institution to simply throw them away or to ignore them. Do a better job of designing the survey instrument, obviously, but also, to at least some extent, trust the students. I know in our institution, some of the most “popular” teachers, measured by the number of butts in the seats of their classes everyday, are also the most rigorous and, likely, the most effective. And, ironically, not necessarily the ones who one would suspect merely teach down to their students’ worst preferences and tendencies.; 2/02/2007 9:47 AM
Jeffrey Harrison said...: Mark has written a comment on my post (or I think it is on my post) about which I have two responses. 1. I agree with much of what he has written. 2. I fear someone reading his comment will think it accurately characterizes my post.

Let's go in reverse order.

a. Not that it matters or should matter, but I do not think, what ever Mark claims I said, that assessing the quality of scholarship is easy or harder than assessing the quality of teaching. I do not know. What I did say was there is, on Moneylaw, a focus on scholarship. My explanation is that you can at least quantify it and then debate what it means. With teaching the discussion cannot even get started.

b.I am not aware of saying we should throw away the evaluations. Really low ones probably do indicate something is amiss. My point is to urge a process of determining what they tell us, if anything. Hence the long boring discussion of possible ways to perform that test.

c. My reference to the Angry Gator was an anecdote, so identified, and supportive of what many if not most of my colleagues have told me. They feel pressure because of student evaluations to alter what they do. Some give in, some don't. They struggle with it as do I. Their perception is that there is a payoff if you "play ball."

When people reply to blogs and misstate what was said I tend to think the blogger has pushed a bottom but since I agree with much of Mark's actual view of evaluations, I am not sure which button was pushed. Maybe he just did not have time.

So, to the first matter. I agree that popular teachers may get high evaluations. As I put it, and Mark must have missed, "high evaluations can be the result of excellent teaching." The problem I have is that they are not always indicative of excellent teaching. Poor teachers may get high ones and good teachers lower ones. So, if I am looking at the student evaluation I am not sure of what the numbers mean about the quality of teaching by that individual.

I also agree that a teacher who is as effective as another but makes it less painful could be regarded as a better teacher. In fact, at the time of the economics experiment I described that was an idea we discussed.

I disagree in two respects. More accurately in the case of one "disagreement" I am not at all comfortable with the idea of teaching style becoming a fund raising tool. Too many times I have heard from a graduate in reference to a law professor, "I really hated that bastard at the time but he really taught me -----." If this professor gets poor evaluations should he be advised to change?

The most worrisome part of Mark's response is this: "I know in our institution, some of the most “popular” teachers, measured by the number of butts in the seats of their classes everyday, are also the most rigorous and, likely, the most effective."

First, I am not if "butts in seats" means regular attendance or registering for courses. If it is the former, "butts in seats" could mean the professor is popular or it could mean the professor takes role. Thus, I am not inclined to measure "popularity" by "butts in seats." If he means to measure popularity by enrollment, that is possible. The problem is that high enrollment can be a signal that the professor does not care if there are "butts in the seats."

He also says that "some" of the popular teachers are the most rigorous and likely to be effective teachers. It is hard to disagree with this but the evaluations do not tell us the identity of "some."

But surely he must mean something other than the obvious -- some popular teachers are effective teachers. Maybe I am reading too much into it but it is possible he is suggesting he knows the identity of the teachers who are popular and rigorous and, I suppose, the ones who are popular and not rigorous. Of course, the student evaluations are the same for both groups and thus, he cannot be relying on them for this insight. It is hardly a good argument in favor of student evaluations.

Is it hearsay? Is it the teachers' own claims? Can't be that since I can count on one finger the number of teachers who have said to me "I really made nice to the students today." Is it class visitation? I have a hunch it's not that. In twenty years of class visitation I cannot say I ever felt confident enought to say "I know" that one group is students were learning a great deal and another was not.

What Mark's sentence tells me is what we knew all ready. Law schools do not really make much of an effort to evaluate teaching effectiveness. Instead we are satisfied with limited samples of student gossip, hunches, assumptions, perhaps based on credentials, and broad statements. The only way to "know" is to determine what the students know when the course is over.; 2/03/2007 1:53 AM
Anonymous said...: Excellent: We agree. Just as you think I misread your post, very little of my comment was meant as a direct rejoinder to your post; the last paragraphs I think were pretty uncontroversial, and were just my own thinking aloud about whether student evaluations have any value at all.

But here's a follow-up question, asked from ignorance. Why should, or do, tenured faculty care about their student evaluations? Presuming that those evaluations are within a respectable range, what should matter about where they lie within that range? I ask this to note at least one of the bad unintended consequences of tenure: That evaluating teaching seems only to matter for an institution during the pre-tenure period, when (some) institutions act like it matters a whole lot. That lasts 4-6 years. After that brief period, during the decades that follow, there's minimal if any effort to provide any sort of peer evaluations or oversight of senior colleagues' teaching, besides the ritualistic collecting of student evaluations. To my knowledge, merit pay raises -- whether awarded on a regularized basis or in order to retain someone who presents a competing offer from another school -- are tied far more to research productivity than to teaching ability.

I suppose there are ego inflation/ deflation issues, but otherwise I'm not sure what the big deal is post-tenure. Which demonstrates a certain lack of accountability -- or, if it is as difficult as you (I think quite rightly) maintain to evaluate teaching, then it shows a lack of caring to appear to be accountable, even if accountability measures are less than perfect. Which then appears to be another significant flaw in the default efforts of most law schools; which then of course would allow some entrepreneurial law school to figure out a better way of evaluating and holding accountable senior, as well as junior, faculty, if that were possible....; 2/03/2007 2:17 PM
Jeffrey Harrison said...: You are right, obviously. The whole issue does put the cart before the horse. Post tenure there is little motivation to worry about scholarship or teaching. Pretenure there is more to worry about but some of that can be offset by making enough "nice," going into the easy letter market, being careful when it comes to voting, and credentials.

Still, I think there is a shared feeling among Moneylaw contributors that law schools would be better places if there were more attention paid to actual performance. This would seem to mean that accountability is part of the "MoneyLaw plan." To me, this leads to systems of evaluating scholarship and teaching that are objective and reliable.

That is a great deal to ask especially when you are asking an cozy establishment that potentially has so much to lose.

I honestly do not know but is it possible that there are "indicators" of poor teaching that one could get to without essentially testing for what the students know.

My uninformed hunch is that the better teachers do not teach from bar review materials, are willing to teach a new course once in awhile, :-) like to change materials from year to year, do not use machine graded multiple chioce tests, do not miss classes unless necessary and make them up when they do.

Of course, maybe these just are indicative of a work ethic and intellectual curiosity.; 2/03/2007 3:07 PM