As with the ANNR study, I think the method used to calculate the LEVA scores is explained well enough in my original posts. So here I'll just focus on pointing out the grosser flaws in that method. My basic premise for the LEVA score is that the academic index of entering students for each law school might have some relation to the success of those students in passing the bar exam. So we might expect a school's bar passage rate (or rather, the difference between the school's pass rate and its primary jurisdiction's pass rate) to coincide with the academic indexes of students that it enrolls.
If we sort all schools from top to bottom by bar exam "performance" and again by median student academic index, then we might expect the order to be pretty much the same (again, assuming that academic index correlates at all with bar exam success). And we might measure value added (or subtracted) by looking at schools that have large gaps between their positions in these two lists. In order to see the problems with this approach, lets look again at numbers reported to the ABA by the Notre Dame and Thomas M. Cooley law schools:
NUMBER OF GRADUATES ND TC
Awarded JD degree: 184 805
Sitting for bar exam: 69 229
Passing bar exam: 62 183
Bar passage rate: 0.90 0.80
As I pointed out in my earlier post, the first problem here is that schools only report bar exam results for students taking the exam in one or two jurisdictions. For each school we can only see outcomes for 28 - 38% of students who earned J.D. degrees. We have no idea whether the rest of the graduating class even took a bar exam anywhere else, or if they did how many passed. Schools may know these outcomes, but as of today they do not report them to the ABA -- at least not as part of the public dataset that appears in the Official Guide.
Click here to read the rest of this post . . . .Just as the low response rates on employment surveys at some schools suggest significant selection bias in their results, the scant coverage of these bar passage reports also suggests great caution in using their results for any serious purpose. If we had full results, then we might find that the passage rate for all graduates tracks closely with that of the minority results disclosed. But we might also find that many graduates not covered by the report never even bothered to take the exam, or that they took it and failed at a much higher rate than those who took an exam in the primary jurisdiction.
There may be no good source of full bar exam results for all schools today. Some states disclose pass rates by school for all takers, but I am not aware if all states give out such full results. Even if states have the data, it could be a huge amount of work for someone to aggregate all of it. I do not know whether most schools collect full results or not. If they do, then it seems a simple matter for the ABA to request and report data on all graduates.
But there are further problems with using the pass rate as well, which are: 1) Pass rates vary widely by state; 2) The pass/fail criteria is so coarse that it may have only limited correlation with any other factor; and 3) Even with full reports of those who take the exam, pass rates may reflect significant extinction bias relative to the population of entering students. I tried to adjust in a crude way for jurisdiction variances in the LEVA formula, but nothing can overcome the basic coarseness issue. And the extinction bias which plagues the ANNR measure operates here to just as great a degree -- by definition only those students who succeed in graduating can even attempt the bar exam.
Also, the population level data reported for academic index and pass rate do not allow us to pair inputs and outputs per student -- which we would need to do in order to truly measure how outcomes vary for students with the same indexes at different schools. And I assumed for the purpose of the LEVA measure that an index composed of the median GPA and median LSAT at each school approximates the academic index of the "median" student at that school. But this may be wildly inaccurate, because the medians for each component are reported separately.
If a school accepts many students with "split" numbers, for instance, then they might actually enroll no students at all with a composite index close to one composed of the reported medians. A school might accept 100 students with an LSAT of 170 and GPA of 2.0, and 100 others with an LSAT of 150 and a GPA of 4.0. And their reported medians might be identical with those of a school that filled its class with 160/3.0 candidates. But these could be very different populations in terms of how well they perform on the licensing exam, even with identical courses of instruction during law school.
As Jason and commentors on his series mention (and as Andrew Morriss and William Henderson suggest), what we really want are paired input and output metrics, with some rough sort of theoretical correlation, and with sufficiently granular results. With that data we could mark the changes from before to after per student, calculate the median change for graduates from each school, and see how that metric compares between schools.
Jeff Harrison posted recently about how law schools in Brazil apparently figured this out a while ago. If we had a data set with paired LSAT and MBE/MPRE scores for graduates of all schools, then that would make for about the best input vs. output metric we're likely to get. It would be far from ideal, but still an order of magnitude better than what we have now. I'm not sure if LSAC or NALP include MBE scores in their longitudinal studies, but anyone with access to their public use datasets could answer that. If they do, then those data might provide the foundation for something approaching a valid measure of educational value add, at least for an initial study.
Such a study would not give a complete measure of "quality," but it would at least measure effectiveness in preparing students to perform well on the licensing exam and enter practice. And while that is not a complete and sufficient measure of quality in a program, I would argue that it's at least one necessary component. This added to the elements of student engagement suggested by Jason would make for a really valuable and informative ranking of program quality, and one that students could use to make some rational decisions on which school to attend.
Jason cites Russell Korobkin's proposal that the function of rankings as a signalling mechanism is not a bug -- it's a "feature." Korobkin's point is well taken, and many candidates rely on the USN&WR rankings for this very reason. But most candidates also have to choose from a number of schools of similar "signalling value" in deciding where to apply or attend, and this is where it seems that true measures of engagement, effectiveness, and "quality" can inform.
A genuine quality measure, and one that was widely relied upon by candidates, would inspire schools even at similar and very high levels of "prestige" to compete with each other on "quality." And a useful quality metric would allow candidates to make much more informed cost / benefit choices between schools that are otherwise hard to distinguish with existing rankings. That would promote Korobkin's suggested goal of using rankings to encourage schools to produce a public good (high "quality" education) which might otherwise be less available than we would like.
I like Jason's idea of using responses to the Princeton review surveys to create a sharper view of educational quality than reputation surveys and citation counts. Unlike reputation surveys, citation metrics at least touch some thing that law schools actually do, but that thing may or may not relate well to teaching law students to be good lawyers. As Jim suggests in his Bibliometric Manifesto, we ought to be able to use quantitative metrics like student survey results as a mirror for qualitative value in legal education -- as long as we have the right metrics.
But there are still questions about any metric based on self-reports of engagement or satisfaction. I wonder, for instance, how much correlation we might find between the lists of schools with the best and worst academic experiences and those with the highest and lowest assigned medians in their grading curves.
At the very least, the actions that schools could take to improve their performance on metrics like these seem like things we might actually want them to do. As we all know by now, metrics do not just inform behavior -- they inspire it as well. And between the two effects -- informing and inspiring -- the latter is most often the greater.
(*) Some of the material here is reproduced or adapted from earlier comments or emails of mine related to Jason's series.