Monday, September 07, 2009

66% of the Time, Every Time


When I began teaching economics something struck me during the first week. I knew a fair amount about economics -- much less than I thought -- but I had received not even a minute's worth of instruction on teaching. All I could think to do was read the book, more or less explain it in my own words using examples not in the book, and answer questions. There were no war stories for a first year teacher of microeconomic theory. One thing that gradually occurred to me is that a knowledge of economics, and then later of law, only accounted for about 66% of what I did as a teacher. And it also occurred me that while students see the professor while he or she is teaching, they only witness about 66% of what goes into teaching.

Other courses, common sense, and day to day experiences inform teaching yet their importance remains behind the scenes. One of the most useful courses I took was a required freshman level course in logic. I am not sure it is required or even offered any more but it did mean that I do not confuse causation and correlation. It also meant that I do my best to correct students who reason like this: "The professor does not need to take role because I attend regularly" Bizarre, right? But I have heard the very same "reasoning" from law professors. For example, "There is no need to have a rule requiring professors to take role because I already take role." I assume professors finding this acceptable also find it acceptable in class.

And then there was statistics. There I learned the difference between reliability and validity. Reliability means the tool you are using when applied to the same data gives you the same result. Validity means the tool is actually testing what you are intending to test. If you've got a tape measure that has been stretched it may consistently measure your waist at 32 inches even though is is 36 inches around. It's just not a valid test of your girth, although can surely be a source of great happiness.

The meaning of a normal distribution also came up and can be understood in the context of reasoning I have heard twice lately: "My method of testing is valid because it produced a normal distribution." I most recently heard this from someone administering a law exam to people with widely varying knowledge of English. The normal distribution means nothing about the validity of the test. My guess is that what she was testing was the ability to understand English. The normal distribution fixation is particularly odd. If the students in the class are normally distributed then, hopefully, the test result will reflect that. On the other hand, getting a normal distribution does not mean the same is true of the class itself. In fact, a normal distribution could just as easily cause concern about the test. Normal distributions are, however, convenient when grades must be assigned.

And now back to logic. Remember your high school math classes. Some teachers said to show your work and then gave you credit if you got everything thing right except, say, the final step. Others just machine graded.The problem is this. In most complex math problems there are many ways to get a wrong answer. Some reveal that the test taker did not have a clue. Some reveal that the test taker forgot to carry the one on the last step. The machine grader gives them the same credit although their knowledge and understanding are quite different. The teacher who requires the student to show his or her work makes a distinction because there is a distinction. Of course, the same is true in law where the issues are not simply complex but more nuanced.

This also relates to the point that students see only about 66% of what goes into teaching. Suppose you give a machine graded exam and there are 10 reasons that could explain a wrong answer. If most of the students are getting it wrong for the same reason, it suggests an opportunity to improve one's teaching the next term. (Unless, of course, the goal is not really to teach but to get a good distribution.) I assume the machine graded test givers just plow along without pin pointing the problem which may reflect their teaching as much as student diligence.

The all time prize for irrational testing actually goes to essay test givers who say something like "Answer 3 of the next 5 questions." There are many combinations of 3 out of 5 and each one represents a different test. In addition, a student could get an 80 of 100 on all five and do worse than a student who scores and 85 on three but would have scored a 60 on the other two. Pretty simple, right? This is, however, popular with the students and you know where that can lead.

I would not want to confuse causation and correlation but there is pattern. All of the reasoning that, at least to me, seems in error does make the lives of those making the errors easier. Could it be that reasoning is driven by convenience and self-interest?


4 Comments:

Blogger drago said...

As you say, correlation is not causation.

A machine-graded test might score two students the same despite their mistakes being quite different. For the student who did not carry the one during one step (be it the first, last, or anywhere in the middle), s/he is being penalized just as much as the student who does not use the correct formula, for example. One might try to argue that the penalty is unfair, because it equates these two mistakes, and any other mistakes made, including something as simple as having the wrong starting values because there was a smudge on the page.

However, there are many real-world situations where the mistake, however insignificant, will still produce terrible results. In these situations, the ‘smaller’ mistakes read as not just careless but as ridiculous, especially if they could be prevented or caught before the mistakes caused a larger issue.

For example, in 1999, NASA reported that human error caused the loss of the Mars Climate Orbiter spacecraft [cost $125 million]. The mistake here was, embarrassingly, using two sets of measurements – miles and kilometers – without conversion. Now, had this a mistake that caused a crash been made in a more complex part of the Orbiter’s design and function, wouldn’t it have cost just as much? Yes, it would. $125 million – for a simple mistake. $125 million – for a complex mistake. Imparting this idea to students (especially students in senior or junior year) is not necessarily a bad thing.

Hopefully, though, a machine-grader would not necessarily be a machine-reviewer. By that I mean, the teacher would hopefully still take time to look at the tests and the work shown to see what the students progress in or need help with.

For the “Answer 3 of 5” test-givers, the benefit is given to a student who is not only knowledgeable but is also aware of the knowledge s/he can best represent in the given time limit, if there is one. A person who chooses a question that they do not know much about may have been able to choose a different question to gain a better score, but the fact that s/he did not make that choice might point to a lack of self-awareness. Of course, the test might not be a good gauge of how much the student knows in relation to other students, since there are many combinations available.

On the other hand, what if the test asks questions that demand the same kind of reasoning and understanding of the material, so the test allows each student to pick based on his/her interests? A good example is compositions in other languages. The teacher might allow a student to choose between three essay topics that are all, by the nature of the question, completely personal as far as knowledge goes. If the test is attempting, however, to gauge a student’s understanding of the different past tenses of verbs and time words, the question(s) the students chose would not necessarily be relevant.

[p1 of 3]

9/08/2009 12:27 AM  
Blogger drago said...

Of course, the primary 3-of-5 question tests I have experienced were from history or English courses, where this particular variation was not necessarily true. (Answering a question about the complexity of religion and statehood in European history circa 18th century would require an entirely different set of knowledge that exploring the effects of slavery on indentured servitude in the American colonies.)

As far as deciding if something—be it a testing method or otherwise—is logical or not... well, I do have a problem with the term ‘logic,’ as it is used in various ways and does not possess the same meaning (necessarily) from history to common sense to mathematics. Many people use the word ‘logical’ to mean ‘that makes sense.’ Unfortunately, logic does not always make sense.

For example, there is the 3-door conundrum from the joys of discrete mathematics. Let us assume you are a contestant on a game show, given a set of 3 doors from which to choose. Supposedly, there is a prize behind one door, and the other two doors have goats behind them. (Why goats? I’m not entirely sure. But, you apparently don’t win a goat if you pick a door with a goat behind it.) After you select a door, the game show host reveals one of the doors that does NOT have a prize behind it. You are then allowed either to keep your original selection or to choose the other still-closed door.

The contestant might think, “Logically speaking, what is the probability that your original door choice has a prize behind it?”

If you answered 1/3, you’re correct. Even though you know that one door you did not select is incorrect, you didn’t know that when you chose the door originally. Therefore, the probability remains the same. The best strategy for a contestant, logically speaking, would be to choose the other closed door because the probability of that door having a prize is 2/3, and the door currently selected has the probability of 1/3.

While this is logically (and discretely) true, it doesn’t make sense, or, rather it seems counter-intuitive. I am sure many people will want to look up this famous math problem – if you do, it’s name is The Monty Hall Problem.

So logic, unfortunately, has been bandied about in many circles to mean something that it does not mean... and I think that the major issue with viewing tests as valid mechanisms by using analytical tools and statistics lies in this issue. Someone might think that “getting a normalized distribution” proves that the test is valid. However, given the size of the class, or the type of test, this might point to the test being too hard (assuming the normalized distribution includes a requirement of at least one student failing) or too easy (assuming the normalized distribution includes a requirement of at least one student getting (near to) a perfect score). But, since the terms have been used in different ways and their meanings made unclear, people use them to define validity, albeit incorrectly.

However, if you believe it is the goal of a test to properly and objectively review a student’s acquired knowledge and understanding of course materials, I would venture the idea that all tests (and even grades) are driven by self-interest and convenience. After all, learning disabilities and nervousness as well as any number of additional human factors, such as hunger, grief, low blood sugar, and even certain kinds of happiness, can negatively affect a student’s performance. In this way, the test is not accurately reviewing what students know. Furthermore, since language is not necessarily concrete in meaning (nor are mathematical definitions necessarily concrete when expressed in language), a student may be penalized for misunderstanding the questions on a test, despite having the appropriate information to answer said questions had the meaning been clearer.

[p2 of 3]

9/08/2009 12:28 AM  
Blogger drago said...

Many people teach students that tests such as the SATS requires “strategy.” That is, to me, proof that test-taking is a skill. This means that some students might have plenty of knowledge and understanding, but they lack the skill to “prove” they have this knowledge. Or, more specifically, they lack the particular test-taking skills to appropriately convey the knowledge that they have. Since tests vary from course to course and teacher to teacher, test skills do need to be changed and flexible. On the other hand, there are students who do not have an appropriate amount of knowledge from the class (usually called “passing”), but these students have the skill to “prove” the knowledge they do have and then guess, if possible, the knowledge they do not.

Despite these factors, teachers and professors still use tests (and sometimes paper-writing) as a way to see “how much you have learned.” This is said as if it is an intuitive statement. It is not. Having had a rather particular English/Grammar teacher in elementary school, I had no problem identifying sentence syntax and structure in middle school and beyond. More than that, I could correct poor grammar and use appropriate grammar and define the different structures I used with the correct terminology. This lead to high test scores; however, to claim that this represents “how much I have learned” is ridiculous, since I either learned little or no new material. A student who was just breaking into syntax and grammar on such a scale might have test scores that were quite low, or that fell within the “average category,” but she/he has acquired far more new knowledge than I did from that particular class.

I say that this is driven by convenience and self-interest specifically because many people still use tests to evaluate students, even though these pitfalls are known. Just as students are required to "prove" they have learned materials, teachers are obligated to "prove" that they are objectively grading students.

Should a teacher be challenged on a grade, it is much easier to “prove” a student’s grade if there are tests, homework papers, and other grades that people somehow assume determine that the student “is learning” that add up, in some manner, to the final grade in question. A teacher who relies on things that seem too subjective, such as class participation, or non-documented performance (as in, presentations or verbal interaction requirements), would not have such luck “proving” the validity of the grade the student earned. In interest of protecting oneself from seeming bias or even simply a lack of clarity, tests are a prime way of measuring knowledge.

Might I point out that these things are done, ironically, in the idea of "fairness." A student "proves"... a teacher "proves"... through the mechanism of a test. However, should we (as a society, I suppose) be evaluating this mechanism's validity? Since it is such a cornerstone to our education system, you think we would.

Mount Holyoke College (along with a number of LACs) lists SATS as optional. They have, as of yet, found that there is little difference in the academic performance of submitters and non-submitters. In the first year, "Overall, 24% of our applicants chose not to submit their scores and these students were represented in all admission rating categories; in fact, close to 20% of our top rated students were non-submitters."

Since this research was started in 2005, there is more being done to see if the SATS are an appropriate gauge for students. But it has already come under question.

What is a good way of measuring knowledge? Unfortunately, while necessity is the mother of all invention, the ability to “prove” objectivity is the non-gendered parental unit of keeping the status quo. (After all, the ‘best case’ you can make generally relies on your ability to prove you maintain the ‘status quo.’) I can’t tell you a good way of measuring knowledge, or even intelligence. The controversy surrounding the IQ test is proof of that.

drago

9/08/2009 12:28 AM  
Blogger Jeffrey Harrison said...

Thanks, Drago. Interesting comments. I have used the word "logic" somewhat loosely but in the door examine -- one that drove me crazy for a few days -- once you "get it" it all seems quite logical. Nevertheless Perhaps I could have used "rational" as meaning the law of transitivity in some instances. You are right that small and big mistakes may lead to the same unfortunate outcome but if knowledge were assessed only on the basis of that outcome we would not know much about the test takers.

9/08/2009 9:51 AM  

Post a Comment

<< Home