AustLII Home | Databases | WorldLII | Search | Feedback

Legal Education Digest

Legal Education Digest
You are here:  AustLII >> Databases >> Legal Education Digest >> 2011 >> [2011] LegEdDig 8

Database Search | Name Search | Recent Articles | Noteup | LawCite | Author Info | Download | Help

Todd, P --- "Plagiarism detection software: legal and pedagogical issues" [2011] LegEdDig 8; (2011) 19(1) Legal Education Digest 27

Plagiarism detection software: legal and pedagogical issues

P Todd

Law Teacher, Vol. 44, No. 2, 2010, pp137-148

If universities are to continue to use essay-based coursework as a basis of assessment, we have to have confidence in it. The ease with which students can compile essays from Internet-based sources is a known and probably growing problem. Knowing of its existence, as professionals we are surely under an obligation to employ the best available techniques to counter it. In principle, at least, the Internet can be used to conquer plagiarism, at least as successfully as, up to now, it has facilitated it. But using the Internet to fight a problem exacerbated by the Internet raises issues of both a legal and pedagogical nature. It is these issues which are examined in this article.

It is necessary to dispel the myth that plagiarism detection software is effective only at combating cutting and pasting from Internet sources, and not, for example, the more traditional types of plagiarism, such as copying passages from books. It may not happen for other reasons, but from a purely technical standpoint it should be possible to counter all types of plagiarism, at any rate in essay-based coursework, and put ourselves into a better position than we were in during the pre-Internet era.

Nor are the uses of plagiarism detection software limited to the matter of guarding against the known and obvious pitfalls of plagiarism. It can assist examiners by showing how essays are constructed, whether or not they are technically plagiarised. It can be useful in supervision and examination of theses. Plagiarism detection software can be useful for the students themselves, before finally submitting their work (arguably they should already know where their essays are sourced from, but poor note-taking may lead to mistakes and failures of attribution).

One way of avoiding plagiarism is to set tasks which make plagiarism more difficult. We may nonetheless legitimately conclude that there are skills which can be best assessed using the traditional long essay, written over an extended period – it is, after all, essentially what we are doing when we write our own academic pieces. If that is accepted, then the issue is whether we should be forced by the cheats to abandon an assessment practice for which there are sound pedagogical justifications. Surely not, if there are other ways of countering the problem.

In order to use plagiarism detection software, it is necessary for the marker to have the essays in a digital form. Alternatively, scanning and OCR software can be used, but the OCR software needs to be almost 100 per cent accurate to be of value. With the continuing improvement of screens, and the increasing familiarity of academics with on-screen reading and editing of email, it seems difficult to believe that objections to online submission are sustainable, other than in the very short term.

A fear held by students is that we will not use it properly, that we will place too much reliance on the machine. Clearly, it is our responsibility to act professionally, to allay these fears.

There are three main techniques used by the software packages currently available. First (fairly obviously), there are those which employ search engine techniques, to find matches on the Internet. Secondly, there are those which find similarities between files on a single computer; these are intended primarily to detect collusion. Thirdly, there are those, of which Turnitin is the best-known example, which build up their own archive databases from past essay submissions, and agreements with publishers. It is this third type which provides us with the tools to defeat plagiarism from any source, whether or not that source is Internet-derived.

Many packages use only the first or only the second technique. For example, EVE2 only finds matches on the Internet, whereas CopyCatch Gold and WCopyfind are collusion-detectors. Turnitin uses the first and third techniques (and it is also possible to use its archive to check for collusion). Viper uses all three.

Plagiarism from an Internet source can sometimes be detected simply using a search engine, such as Google, especially if the plagiarised source uses unusual words or language. A marker restricted to using Google will be working at a handicap, however, quite apart from the sheer hassle of making what could be many searches, for each essay submitted. Nonetheless, software that automatically tests every sentence of a file using a search engine, such as, can be surprisingly informative.

Running the same essay through, Viper and Turnitin will produce different results. Because it is not possible to search the whole of the Internet in an acceptable period of time, various devices are used to cut down the search. Search engines use indexes, and order their results on a probabilistic basis. But because most people who use Google are not looking for plagiarised sources, one cannot expect the search to be optimised for that activity. Software written explicitly for plagiarism detection should be able to index more effectively (after all, only a very small part of the Internet is likely to be useful as a plagiarism source), and other techniques are also used to reduce search times. Probably the most effective tools will be subject-specific, and it is notable that Viper asks, for each document tested, for a subject category. Search times can also be very much longer (and hence find more) than would be appropriate for a Google search, especially if the marker is organised and performs the search while doing something else, or batch-processes the essays. Nonetheless, because only a small part of the Internet is actually searched, even the most blatant verbatim copying can sometimes go undetected.

There is also the issue of what is searched for. Restricting the search to an exact string will not catch the student who makes minor changes to a plagiarised passage, whereas not so restricting it can result in many false leads. A software package needs to be able to compare passages of realistic lengths, and not be fooled by minor differences between the target and the checked passage. If the report allows the marker to easily compare suspicious parts of the essay with original sources, the match need not be particularly exact, especially if the software is to be of value in discovering how students construct essays, as well as in detecting plagiarism strictly so defined.

Even packages which only find material sourced from the Internet should become increasingly more effective, as material is increasingly made available online, unless digital rights management techniques are used to protect such material.

Nonetheless, quite a lot of plagiarised material is not accessible on the Internet. Some packages, for example Turnitin and Viper, archive submitted essays, so that future submissions can be compared, not just against Internet sources, but also against these archives. A passage copied from a textbook or other source, which is accessible in printed form only, will not be picked up in the first essay submitted. All future essays using the same source will appear similar, however, not to the original source, but to the first essay submitted. As the archive grows, this technique can be expected to capture most sources, including not only books, but also essay banks. Turnitin also has arrangements with publishers, enabling it to further increase the comprehensiveness of its archive databases. It would be a great mistake to assume that plagiarism detection software is effective only against the copier and paster from the Internet.

Moreover, the more we use it, the more we increase the effectiveness of the software by helping to increase the archives, and also to help the software writers to better optimise the searches. Care and professional judgement are needed to interpret the results. For example, Turnitin’s ‘Overall Similarity Index’, which records the percentage of the essay which matches an Internet or archived source, means very little, and a marker has to read the report very carefully to evaluate it. Many legal phrases, statutory provisions, etc., will naturally be on the Internet, and an ‘Overall Similarity Index’ of zero (even assuming the option has been taken to exclude direct quotes) would be neither expected nor indeed desirable. To some extent, perfectly legitimate paraphrases might also be caught, or conceivably the adoption of a writer’s views, but in an original context. It properly remains the role for the academic, and not the machine, to make a final judgement.

Any marker will check for plagiarism, as thoroughly as time and other resources allow. It is reasonable for any student submitting anything for assessment to expect this, if only to ensure that the assessment is as fair as possible.

There has been litigation in the United States, however, about the archiving of essays. In AV v iParadigms LLC, high school students sued iParadigms, the producers of Turnitin, claiming that the archiving of their essays amounted to a breach of their copyright in them. iParadigms claimed that they were entitled to the defence of fair use, and also that the students, by clicking on an ‘I Agree’ button when they created their user profiles to submit essays to Turnitin, had consented to the use. In the US Court of Appeals, iParadigms succeeded on the fair use issue, and the court did not need to consider the issue of consent.

If a student’s reason for objection were because he or she intended later to sell the essay to an essay bank, it is likely that the public interest defence would apply, even on the narrow test adopted by Aldous LJ in Hyde Park Residence Ltd v Yelland. It must be strongly arguable that an essay bank encourages deception for gain, and ‘[no] court will lend its aid to a man who founds his cause of action upon an immoral or an illegal act’. However, the objecting student could argue that an essay bank provides essays only as guidelines and models for future students, in which case the public interest defence would have no application. In any case the student may have other objections to this use of his or her copyright, and it cannot be certain that a public interest defence would apply.

That is not the end of the problem. A student whose essay contains an appropriate proportion of quotes from elsewhere, properly acknowledged, will not infringe the copyright of the author quoted, but a plagiarised essay will, and the archiving might therefore infringe third party rights. Again, it would be wise, if possible, to deal with this through consent, and it is probable that many publishers would indeed consent to allowing their material to be used in the fight against plagiarism.

Consent might not always be obtainable, however, an obvious category of objectors being writers of essays intended for sale in paper mills, and the owners of such sites. The writers of such essays would not necessarily be current university students, and their consent might therefore be difficult to obtain. The application of the public interest defence would be the same for paper mills as already observed for essay banks, and, given its fragility, it would be wise for software designers to design so as to be able to exclude essays where plagiarism is identified, as well as material where stringent objection is taken, by the copyright owners, to its use.

One of the objections taken by the students, in AV v iParadigms LLC, to the archiving of their essays, was that if they later submitted the same work to a literary journal it would appear to be plagiarised, though their own work. The District Court, whose view was upheld in the US Court of Appeals, had said:

Anyone who is reasonably familiar with Turnitin’s operation will be able to recognise that the identical match is not the result of plagiarism, but simply the result of Plaintiff’s earlier submission. Individuals familiar with Turnitin, such as those in the field of education, would be expecting the works submitted to have been previously submitted.

If this reasoning is convincing, it is another example of the care that needs to be taken when considering a Turnitin (or similar) report. The machine does not relieve us of our obligation to exercise a professional judgement.

It is possible to counter plagiarism by setting tasks which make plagiarism more difficult. Jude Carroll, for example, suggests (among other things) making tasks more individual, requiring more individualised answers, requiring a signed statement of originality, assessing the process as well as the final product, requiring submission of drafts, asking for an outline or list of resources instead of a finished product, and thinking carefully about assessment titles, for example by asking about narrow and recent topics. In law, from my own experience I know that we can set problem questions, changing them each time the assignment is set, and use very short deadlines, making plagiarism more difficult.

In any case, as has already been observed, plagiarism detection software has uses other than countering plagiarism alone. Whether or not we alter the assessment tasks we set, we should not turn our backs on a useful tool.

If our arsenal includes routine use of plagiarism detection software, that will force us to think more carefully about how we teach and assess. At the very least, information on how students construct essays will inform us how to set future work. We may find, for example, heavy reliance on a particular source, and set future assignments to discourage its use. If students know in advance that their essays will be tested, and in particular if they can see the reports, it is unlikely that they will be tempted to cut and paste substantial amounts of material, but they might instead be tempted to try to reduce their ‘Overall Similarity Index’ (or equivalent) as far as possible. To some extent this activity might be a useful exercise, but students should be warned against taking it too far. Unless the software becomes sufficiently sophisticated to ignore standard legal phrases, students should be advised not to aim for a zero score.

Plagiarism requires at a minimum the copying of a text document from another source without acknowledgment. Various motivations are possible:

Often, students have not understood the conventions of academic writing or have not yet learned to use the skills of citation, paraphrasing and using others’ ideas to underpin their own arguments. Sometimes, students deliberately break the rules, choose to cheat, and have little or no interest in upholding the values that underpin academic integrity.

Whatever the motivation, plagiarism constitutes bad work, but if it results from a failure of understanding, or from time pressure or incompetence, it may not be appropriate to penalise it further. The reason why plagiarised work cannot simply be treated as bad work is that there are also more sinister motivations: ‘plagiarism is an act of fraud. It involves both stealing someone else’s work and lying about it afterward’. What causes our concern is the student who deliberately passes off another’s work as his or her own, pretending to a merit that he or she does not possess. We take plagiarism so seriously, and punish it, because the motivation might be of the second type, rather than the first. It is this motivation which threatens the integrity of our assessment practices. The possibility of bad work does not; we simply accord the work an appropriate mark.

If plagiarism detection software becomes sufficiently effective, the variety of plagiarism which constitutes cheating will become literally impossible. Suspicions of dishonesty will no longer have any part to play in the marking process, which will be geared solely toward assessing the quality of the work. Copying will be evidence of incompetence, rather than dishonesty. Indeed, once the taint of dishonesty is removed from the equation, we might even place a value on the ability simply to find material effectively on the Internet (while the value we place on this skill may not be high, we cannot entirely deny its utility in the modern world).

Plagiarism detection software will never be able to defeat the determined and well-funded cheat. Students will still be able to buy bespoke essays, written by others on their behalf, and if they are never re-used they will continue to go undetected. Since these essays will lose their value after just one use, and since the ghost-writers will know that their own work will be tested, it may be supposed that this form of cheating will become more expensive, and therefore rare. We may prevent only 95 per cent or 99 per cent of cheats, but that is a lot better than nothing, even if we cannot detect 100 per cent. That is, after all, the basis of much crime prevention.

We can probably turn the tables in the plagiarism arena pretty effectively even if not totally. We can enjoy the other advantages of plagiarism detection software, apart from its intended and obvious use. But far from abandoning our professional judgement to a machine, we will have to use the software carefully, and to think harder about what exactly we are examining. Universities and the software providers will have to address the copyright issues which archiving, at least, throws up. Software providers may have to counter new techniques for cheats, as yet undevised.

AustLII: Copyright Policy | Disclaimers | Privacy Policy | Feedback