Home \| Databases \| WorldLII \| Search \| Feedback Legal Education Digest

Home | Databases | WorldLII | Search | Feedback

Legal Education Digest

You are here: AustLII >> Databases >> Legal Education Digest >> 2012 >> [2012] LegEdDig 35

Sargent, C et al --- "Empirical evidence that formative assessments improve final exams" [2012] LegEdDig 35; (2012) 20(3) Legal Education Digest 7

Empirical evidence that formative assessments improve final exams

Empirical evidence that formative assessments improve final exams

C Sargent, C Springer and A A Curcio

Journal of Legal Education, Vol 61, 2011-2012, pp 379-405

Law school may be one of the few spots on campus still using a comprehensive exam for the entire course grade, even though many have called for an end to this single assessment model. Pedagogical scholars have suggested that law professors begin using formative assessments, assessments designed to provide students with feedback throughout the semester, arguing that giving regular feedback enhances student learning and performance. Despite the call for more feedback and guiding assessments in legal education, the single end-of-semester exam remains the norm, so the impact of formative assessments on law students’ learning remains largely unexplored.

Formative assessments seek to increase learning and motivation by offering students feedback about gaps between current and desired levels of performance.

Feedback is more effective when it provides details of how to improve and explains why an answer is correct. Numerous studies suggest that feedback may be more effective if ungraded because students tend to focus on grades, not suggestions for improvement.

Turning to student responses to the feedback given, formative assessments do not just fix misconceptions and knowledge lapses, they potentially change student motivation and study strategies. By giving students information about shortfalls early in the course, they have the opportunity to adjust and improve, potentially inspiring more effort. Feedback can also suppress motivation. For example, formative assessments that compare a student’s performance to that of his or her classmates may inhibit learning because when faced with such comparisons ‘people who perform poorly tend to attribute their failures to lack of ability, expect to perform poorly in the future, and demonstrate decreased motivation on subsequent tasks.’ Not surprisingly, feedback promotes learning best if it is received mindfully, if students accurately perceive what they do not know, and if they are motivated to fix the problem. In summary, not all students are helped equally because feedback effectiveness turns not just on the materials provided, but also on the ability of the recipient to digest and use the feedback, as well as their goals, self-confidence, interest, and intentions.

One earlier study in a first-year civil procedure class found that multiple practice essay questions followed by annotated model answers helped students with above-the-median LSAT scores and UGPAs break down legal rules and perform a complex factual analysis on a final exam. The current work builds on that study which compared two sections of a required civil procedure course, one giving students practice essay exam questions accompanied by various forms of feedback throughout the semester, and a control section with no formative assessments. One weakness of the study was that, although both classes took the same traditional cumulative final exam, they were taught by different instructors. The present investigation eliminates the instructor confound because the same instructor taught both groups.

One surprise from the prior work was that practice essays only helped students with above-the-median LSAT scores and above-the-median UGPAs. The study’s authors attributed this to students’ different metacognitive abilities, i.e. students’ ability to identify what they did wrong (self-observe and self-judge) and develop and implement strategies to fix their weaknesses (self-react). The cognitive psychology literature claims that students with stronger metacognitive skills are better able to use the information gleaned from feedback and apply it to new exam question scenarios and that these abilities can be developed. The present study adds self-reflective exercises to help students understand the specific gaps in their analysis and reasoning, hopefully strengthening their ability to self-observe, self-judge, and self-react.

The prior work used a t-test to examine differences in exam scores between the two groups, one with practice essays (intervention) and the other without (control). Although the t-test identified a statistical difference between the two groups, it could not identify how much of the difference was associated with any particular causal variable – in other words, how much of the exam score difference was attributable to the intervention versus other variables such as law school predictors or other academic behaviours. In this study, we use a regression model allowing us to look at variables other than the formative assessments that might predict the exam score differences, such as motivation to use feedback, law school grades, or law school grade predictors.

Participants were students enrolled in a second-year required Evidence course in spring 2008 (n=67) and spring 2009 (n=51), both taught by the second author at a second-tier urban public law school with a diverse student body.

The 2008 Evidence students were the control section. They were taught using a problem method supplemented by case analysis. Those students had one cumulative final exam counting as the full course grade. The 2009 Evidence students, the intervention group, were taught using the problem method supplemented by case analysis, but also received a series of formative assessments, including five ungraded quizzes and a graded midterm. Following the quizzes and mid-term, students were given model answers and grading rubrics and were asked to engage in reflective exercises to help them calibrate their comprehension and prepare for the cumulative final exam. The final exam in the intervention section counted 83 per cent of their course grade because the professor wanted the course grade to correspond primarily to the summative assessment, the comprehensive final.

All students, control and intervention, were taught using the same casebook and other materials. The only difference in substantive coverage was that students in the intervention group had about five hours less in-class case and hypothetical problem analysis so that they could complete or review quizzes, reflective exercises, and the midterm.

In the intervention group, at the end of weeks one and three, students were given an in-class ungraded timed quiz, followed by class time to compare their responses to a model answer, a grading rubric, and a brief self-reflective questionnaire. The professor asked the students to turn in their self-reflective questionnaire but they were not required to do so. At the beginning of week five, students took a three-question ungraded short-answer take-home quiz, which was due the next class. Again, students had class time to review a model answer and grading rubric and were asked to complete a longer self-reflection. Students received an extra raw score point toward their total grade if they turned in this self-reflection. At the end of week seven, the professor gave students a multi-issue in-class un-graded quiz accompanied by a model answer, grading rubric, and self-reflective questionnaire. For this quiz, students were asked to peer edit a classmate’s responses during class and self-edit outside of class. At the end of week eight, students took an open-book in-class timed graded mid-term consisting of one multi-issue short essay question (350 words) and three short answer questions (150 words). At the end of week ten, the professor returned the students’ mid-term with comments, a grading rubric, and a model answer which also contained information about common errors and how to avoid those in the future. Students were asked to grade themselves using the rubric and to complete a set of self-reflective questions before they were given their graded exam answers. At the beginning of the last week of class, week fourteen, students received the last take-home ungraded quiz with one multiple-issue short essay question and two short answer single issue questions. A model answer and grading rubric were also distributed along with a very short set of self-reflective questions.

Students’ undergraduate grade point average (UGPA) was used to measure prior general academic achievement. Law school grade point average (LGPA) was used to measure achievement in the first year of law school. These variables were used to predict exam performance in the regression model.

Both exams had two short essay questions (of approximately 400 words each). The essay questions in 2008 were similar, but not identical to, those used in 2009. The remaining questions were eighteen (in 2008) or fifteen (in 2009) short answer questions (approximately 125-150 words). Eleven short answer questions, worth four or five raw points each, were the same between the two years.

The eleven common questions were graded using the same rubric for both years.

Students in the intervention section were given one point for turning in their self-reflection exercises in week five of the course. Nine students did not turn in these materials. The regression model included a variable indicating whether they submitted the self-reflective exercises to control for motivation or interest in formative assessments.

As hypothesised, we found that formative assessment experience (the intervention) was significant in explaining the variance in common question scores for students in the intervention group with at or above the median LSAT scores (t=2.5 3 9 , p=0.013) but not for those below the median (t=-0.280, p=0.78 2). The effect size for the top two-thirds of the intervention class, 4.595 points out of 50 or almost a full letter grade (9.19 per cent), is moderate to large.

Re-running the above regressions with the participants split by roughly the same proportion of students, top two-thirds by LGPA and UGPA, we found that experience with formative assessments was significant only in predicting common question scores for students in the top two-thirds of the class based on UGPA (t=2.202, p=0.030). For students in the top two-thirds of LGPA, the result approached significance (t=1.807, p=0.74).

Comparing the control and intervention scores on the common final exam questions, formative assessments improved performance for a majority of students taking a second year Evidence course. Splitting the class into the top two-thirds and bottom one-third by LSAT scores and UGPAs, we found that the effect was concentrated with students in the top two-thirds, regardless of their first-year law school grades. Thus, the benefit inured to students with both above and below the median law school first year grades. The effect size for those who benefitted was moderate to large, just over nine percentage points.

The prior study found that practice essays helped those with above median law school predictors (UGPA/LSAT). This work replicates the prior study and expands the effect to a large portion of the class (two-thirds). The current work also goes beyond the prior study, which only looked at grade predictors, not actual law school achievement. Thus, it did not look at whether the formative assessments helped students with below-the-median law school grades. This study demonstrated that formative assessments had a positive impact on students with below-the-median first year law school grades, as long as those students were not in the bottom one-third of the class in terms of either UGPA or LSAT score.

In the hope of expanding the reach of the formative assessments to a larger segment of the class, this work varied from the earlier study by using a series of short essay and short answer practice exam questions instead of longer essay questions, providing time in class to complete the assessments, including a grading rubric with all but one of the formative assessments, and adding self-reflective question exercises after all assessments. However, we do not know specifically which of the new resources prompted higher exam scores in the intervention section.

What is discernible from the study’s results is that 70 per cent of the intervention group benefitted substantially (nearly a letter grade) from the formative assessment materials. Unfortunately, 30 per cent did not or could not use the materials to monitor and improve the quality of their work against instructor standards. There are a number of potential explanations for why some students benefited more than others from the practice tests and self-reflective exercises.

One reason stems from the fact that not all students are able to use feedback to improve. Information on gaps between current performance and desired standards is considered feedback ‘only if used to alter the gap’. LSAT scores and UGPA may reflect experience with successfully using feedback to improve test scores and an ability to use feedback to narrow achievement shortfalls.

Another reason for the lack of effect for the lower scoring LSAT/UGPA students may lie in difficultly in perceiving the feedback messages or calibrating their comprehension. Students’ ability to identify what they know and don’t know is a meta-cognitive skill. LSAT scores and UGPA may reflect stronger meta-cognitive abilities and thus stronger abilities to identify lapses in knowledge and understanding.

Another explanation for the greater verifiable impact of formative assessments on the top two-thirds of the students (measured by LSAT scores and UGPA) is that these students may focus more on grades or scores as a critical measure of success, and therefore are more thorough and diligent in using the materials to maximise scores. College students typically adopt surface, deep, or strategic approaches to learning and these approaches can impact their academic outcomes. The main goal of deep learners is to learn and understand; surface learners complete required tasks but without interest in learning; and strategic learners attempt to get high grades, avoiding activities that jeopardize scores and maximising activities that improve scores. A study using law students found that those who focused on achieving high marks had higher LSAT scores. Accordingly, those with higher LSAT scores may be more willing or able to ‘slavishly copy the exemplars’ to achieve their grade goals than those who benefitted less from the materials.

Our results may also reflect that diligent use of formative assessments is dependent on a student’s sense of confidence that the materials will help them. Most law students never get feedback after a final exam, and few review their final exam answers. So, the reasons underlying their grades often remain a mystery. Reducing uncertainty about how one can achieve good grades may lead to higher motivation, more efficient studying strategies and greater confidence that studying harder will produce better grades.

The unequal class time between comparison groups may have suppressed some of the learning effects. The formative assessments were completed and/ or reviewed during class, taking about five total hours, so the control group had more class hours to spend on course topics. In other words, the feedback didn’t just have to be helpful, it had to be more helpful than additional class time.

Although this study indicates that formative assessments improved student performance, in particular for those with higher LSAT/UGPAs, there may be a Hawthorne effect, e.g. students did better because they knew that their performance was being studied. The second author created the study materials after the 2008 course was complete so only the 2009 students were aware that their performance was being monitored. Thus, the 2009 students’ performance may have been impacted by their desire to please the investigator who was also their professor. However, if that were true, one might expect an across-the-board increase in performance rather than a stronger effect for those with higher LSAT scores/UGPAs.

Looking at this study’s results in light of Christensen’s study about mastery versus performance-oriented learners raises questions about whether practice materials inadvertently encourage performance-oriented goals, rather than encouraging deeper mastery learning. In other words, do practice materials support those whose main goal is to get higher course grades rather than assisting those who wish to truly comprehend and master the content? While these goals are not necessarily mutually exclusive, professors typically prefer to downplay grade goals in favour of mastery goals. This leaves a lingering question about whether formative assessments are a positive addition to law school assessment culture. Fortunately there is significant work in the literature suggesting that the nature of exam questions strongly influence student study practices and learning approaches, so if the practice exam questions call for deep learning, students will adopt deep approaches to learning.

This study had a relatively small sample size, especially when sub-dividing participants into the top two-thirds and bottom one-third of the class by certain measures, so the results should be treated with caution.

The results add to the previous study, reinforcing the power of formative assessments in law classes. The data show that formative assessments can improve students’ final exam scores for a majority of students, and that some students with weak first year grades may catch up to their peers with feedback. However, the benefit seems to accrue disproportionately to students who are in the top two-thirds in terms of LSAT/UGPA, perhaps due to their desire or ability to adjust to feedback, their higher confidence in their own ability to effectively use feedback, and their ability to better self-monitor and calibrate their comprehension.

While students with both above and below the median first-year law school grades improved, our tests did not detect learning advantages for one-third of the students (those with the lowest LSAT scores and UGPAs). It is unclear whether additional practice, more extensive reflective exercises, or different kinds of practice could provide benefits to that last one-third of the class.

In sum, we have provided new evidence that shifting the law school culture away from a single summative assessment may advantage students. We believe this work highlights a win-win that should advance the wide-scale experimentation and adoption of good formative assessment practices in law classes.

AustLII: Copyright Policy | Disclaimers | Privacy Policy | Feedback
URL: http://www.austlii.edu.au/au/journals/LegEdDig/2012/35.html